

BIOINFORMATICS SOFTWARE AND DATABASES
-
CanProVar
CanProVar is designed to store and display single amino acid alterations including both germline and somatic variations in the human proteome, especially those related to the genesis or development of human cancer based on the published literatures. Cancer-related variations and conrresponding annotations can be queried through the web-interface using Protein IDs in the Ensembl, IPI, RefSeq, and Uniport/Swiss-Prot databases or gene names and Entrez gene IDs. Fasta files with variation information are also available for download.
-
CEA
CEA (Complex-Enrichment Analysis) uses a protein interaction network-assisted approach to improve protein identification in shotgun proteomics. A large proportion of possible proteins are eliminated as a result of insufficient experimental evidence in shotgun proteomics data analysis. CEA can be used to rescue the eliminated proteins based on a simple assumption: possible proteins are more likely to be present in the original sample if they exist in a complex enriched with confidently identified proteins. In various data sets tested, CEA increased protein identificaion by 10-30% with an estimated accuracy of 85%.
-
DirecTag
Sequence tagging is designed to complement database search tools like MyriMatch by providing partial explanations for experimental data. The partial explanations (tags) can later be used to reconcile against a protein database, usually allowing for modifications and/or mutations because of the extreme filtration of the search space by the tags. DirecTag has been tested with many instrument types. It can parallelize its task across multiple processors via threading or compute nodes via the Message Passing Interface.
-
ERGR
The aim of the Ethanol-Related Gene Resource (ERGR) database is to provide a comprehensive and useful gene resource to the Ethanol/Alcohol research community. Currently, the ERGR database contains more than 30 large datasets from literature and 21 mouse QTLs from public database. These data are from 5 organisms (human, mouse, rat, fly and worm) and produced by multiple approaches (expression, association, linkage, QTL, literature search etc). Users can browse or search the database in different levels. Moreover, ERGR provides data integration (union and intersection) and candidate gene selection based on multiple datasets or organisms.
-
GOTM
GOTM is a Gene Ontoogy (GO) enrichment analysis tool. It compares a user uploaded gene list with all GO categories to identify those with enriched number of user uploaded genes. The result is visualized in a directed acyclic graph (DAG) in order to maintain the relationship among the enriched GO categories. It is designed for the quick analysis of gene lists generated from microarray, proteomics, and other large scale studies.
-
IDPicker
Protein assembly is the process of transforming raw peptide identifications into confident protein identifications. IDPicker infers score thresholds to achieve target false discovery rates (FDR) among peptide identifications. The software can remove artifactual proteins for more parsimonious protein reporting. IDPicker's HTML protein reports cluster indiscernible proteins and those that share peptides, making it far easier to learn the biological lessons presented by each sample. It applies a user-specified experimental hierarchy to collections of peptide identifications and enables users to track the number of spectra or peptides observed for a given protein across multiple experiments.
-
MyriMatch
MyriMatch is a tool designed to take experimental data from shotgun proteomics experiments and compare those spectra against sequences in a known database of proteins. Whether the program is being run in a single-computer environment or across an entire cluster of processing nodes, it is able to optimally divide work in a much more efficient way than many other database search programs. This is because it only generates candidate sequences from the known database once for the entire set of spectra instead of once for every spectrum. Thus, for each candidate sequence generated, it is compared against every spectrum. The spectra keep a certain (user-defined) number of candidate sequences that had the highest scores.
-
SZGR
Schizophrenia Gene Resource (SZGR) provides a comprehensive online resource for schizophrenia genetics studies. Currently, SZGR collected and integrated genes related to schizophrenia from the following data sources and annotations: association studies, linkage analysis, gene expression, literature search, Gene Ontology annotations, protein-protein interaction networks, KEGG pathways, and microRNAs and their target sites. SZGR also provides an online candidate gene ranking tool to help the user prioritize candidate genes by pre-optimized or custom weight schemes.
-
WebGestalt
WebGestalt is a WEB-based GEne SeT AnaLysis Toolkit. It incorporates information from different public resources and provides an easy way for biologists to make sense out of large sets of genes. WebGestalt has four main modules. The gene set annotation module retrieves annotation data from 20 available options in an automated way for a gene set. The gene set organization/visualization module organizes and visualizes gene sets using eight sub-modules: GOTree, Tissue Expression Bar Chart, Chromosomal Distribution Chart, KEGG Table and Maps, BioCarta Table and Maps, Protein Domain Table, PubMed Table and GRIF Table. The gene set statistics module automatically chooses appropriate statistical methods to suggest important biological areas that warrant further study. The gene set manipulation module provides tools for uploading, saving, retrieving and deleting gene sets, as well as tools for the Boolean operation to generate union, intersection or difference between gene sets.
-
BIOINFORMATICS GROUPS