698 Publications

Hierarchical Multi-Label Prediction of Gene Function

Z. Barutcuoglu, R. Schapire, O. Troyanskaya

Abstract
Motivation: Assigning functions for unknown genes based on diverse large-scale data is a key task in functional genomics. Previous work on gene function prediction has addressed this problem using independent classifiers for each function. However, such an approach ignores the structure of functional class taxonomies, such as the Gene Ontology (GO). Over a hierarchy of functional classes, a group of independent classifiers where each one predicts gene membership to a particular class can produce a hierarchically inconsistent set of predictions, where for a given gene a specific class may be predicted positive while its inclusive parent class is predicted negative. Taking the hierarchical structure into account resolves such inconsistencies and provides an opportunity for leveraging all classifiers in the hierarchy to achieve higher specificity of predictions.
Results: We developed a Bayesian framework for combining multiple classifiers based on the functional taxonomy constraints. Using a hierarchy of support vector machine (SVM) classifiers trained on multiple data types, we combined predictions in our Bayesian framework to obtain the most probable consistent set of predictions. Experiments show that over a 105-node subhierarchy of the GO, our Bayesian framework improves predictions for 93 nodes. As an additional benefit, our method also provides implicit calibration of SVM margin outputs to probabilities. Using this method, we make function predictions for multiple proteins, and experimentally confirm predictions for proteins involved in mitosis.

Show Abstract

Discovery of Biological Networks from Diverse Functional Genomic Data

C. Myers, D. Robson, A. Wible, M. Hibbs, C. Chiriac, C. Theesfeld, K. Dolinski , O. Troyanskaya

We have developed a general probabilistic system for query-based discovery of pathway-specific networks through integration of diverse genome-wide data. This framework was validated by accurately recovering known networks for 31 biological processes in Saccharomyces cerevisiae and experimentally verifying predictions for the process of chromosomal segregation. Our system, bioPIXIE, a public, comprehensive system for integration, analysis, and visualization of biological network predictions for S. cerevisiae, is freely accessible over the worldwide web.

Show Abstract
December 19, 2005

Putting the ‘Bio’ into Bioinformatics

A report on the 13th Annual Conference on Intelligent Systems for Molecular Biology (ISMB), Detroit, USA, 25-29 June 2005.

The annual meeting on computational methods for molecular biology brought together 1,731 attendees and covered a diversity of topics from sequence analysis and text mining to structural bioinformatics and pathway prediction. This year saw an increased emphasis on the biological problems that bioinformatic methods are being developed to solve; in addition to many novel developments in traditional areas of bioinformatics, a substantial number of talks focused on integrative approaches, pathway analysis, and comparative genomics. Also on the menu this year were ways of making bioinformatic methods more 'data-centric' and how to make new technologies easily accessible to biologists.

Show Abstract
September 29, 2005

Tools and Applications for Large-Scale Display Walls

G. Wallace, O. Anshus, P. Bi, H. Chen, HY. Chen, D. Clark, P. Cook, A. Finkelstein, T. Funkhouser, A. Gupta, M. Hibbs, K. Li, Z. Liu, R. Samanta, R. Sukthankar, O. Troyanskaya

Increased processor and storage capacities have supported the computational sciences, but have simultaneously unleashed a data avalanche on the scientific community. As a result, scientific research is limited by data analysis and visualization capabilities. These new bottlenecks have been the driving motivation behind the Princeton scalable display wall project. To create a scalable and easy-to-use large-format display system for collaborative visualization, the authors have developed various techniques, software tools, and applications.

Show Abstract

Putting Microarrays in a Context: Integrated Analysis of Diverse Biological Data

In recent years, multiple types of high-throughput functional genomic data that facilitate rapid functional annotation of sequenced genomes have become available. Gene expression microarrays are the most commonly available source of such data. However, genomic data often sacrifice specificity for scale, yielding very large quantities of relatively lower-quality data than traditional experimental methods. Thus sophisticated analysis methods are necessary to make accurate functional interpretation of these large-scale data sets. This review presents an overview of recently developed methods that integrate the analysis of microarray data with sequence, interaction, localisation and literature data, and further outlines current challenges in the field. The focus of this review is on the use of such methods for gene function prediction, understanding of protein regulation and modelling of biological networks.

Show Abstract

Visualization-Based Discovery and Analysis of Genomic Aberrations in Microarray Data

C. Myers, X. Chen, O. Troyanskaya

Background
Chromosomal copy number changes (aneuploidies) play a key role in cancer progression and molecular evolution. These copy number changes can be studied using microarray-based comparative genomic hybridization (array CGH) or gene expression microarrays. However, accurate identification of amplified or deleted regions requires a combination of visual and computational analysis of these microarray data.

Results
We have developed ChARMView, a visualization and analysis system for guided discovery of chromosomal abnormalities from microarray data. Our system facilitates manual or automated discovery of aneuploidies through dynamic visualization and integrated statistical analysis. ChARMView can be used with array CGH and gene expression microarray data, and multiple experiments can be viewed and analyzed simultaneously.

Conclusion
ChARMView is an effective and accurate visualization and analysis system for recognizing even small aneuploidies or subtle expression biases, identifying recurring aberrations in sets of experiments, and pinpointing functionally relevant copy number changes. ChARMView is freely available under the GNU GPL at http://function.princeton.edu/ChARMView.

Show Abstract
December 21, 2004

Accurate Detection of Aneuploidies in Array CGH and Gene Expression Microarray Data

C. Myers, M. Dunham, S.. Kung, O. Troyanskaya

MOTIVATION:
Chromosomal copy number changes (aneuploidies) are common in cell populations that undergo multiple cell divisions including yeast strains, cell lines and tumor cells. Identification of aneuploidies is critical in evolutionary studies, where changes in copy number serve an adaptive purpose, as well as in cancer studies, where amplifications and deletions of chromosomal regions have been identified as a major pathogenetic mechanism. Aneuploidies can be studied on whole-genome level using array CGH (a microarray-based method that measures the DNA content), but their presence also affects gene expression. In gene expression microarray analysis, identification of copy number changes is especially important in preventing aberrant biological conclusions based on spurious gene expression correlation or masked phenotypes that arise due to aneuploidies. Previously suggested approaches for aneuploidy detection from microarray data mostly focus on array CGH, address only whole-chromosome or whole-arm copy number changes, and rely on thresholds or other heuristics, making them unsuitable for fully automated general application to gene expression datasets. There is a need for a general and robust method for identification of aneuploidies of any size from both array CGH and gene expression microarray data.
RESULTS:
We present ChARM (Chromosomal Aberration Region Miner), a robust and accurate expectation-maximization based method for identification of segmental aneuploidies (partial chromosome changes) from gene expression and array CGH microarray data. Systematic evaluation of the algorithm on synthetic and biological data shows that the method is robust to noise, aneuploidal segment size and P-value cutoff. Using our approach, we identify known chromosomal changes and predict novel potential segmental aneuploidies in commonly used yeast deletion strains and in breast cancer. ChARM can be routinely used to identify aneuploidies in array CGH datasets and to screen gene expression data for aneuploidies or array biases. Our methodology is sensitive enough to detect statistically significant and biologically relevant aneuploidies even when expression or DNA content changes are subtle as in mixed populations of cells.
AVAILABILITY:
Code available by request from the authors and on Web supplement at http://function.cs.princeton.edu/ChARM/

Show Abstract
December 12, 2004

Gene Expression Patterns in Ovarian Carcinomas

M. Schaner, D. Ross, G. Ciaravino, T. Sørlie, O. Troyanskaya, M. Diehn, Y. Wang, G. Duran, T. Sikic, S. Caldeira, H. Skomedal, I-P. Tu, T. Hernandez-Boussard, S. Johnson, P. O'Dwyer, M. Fero, G. Kristensen, A-L. Børresen-Dale, T. Hastie, R. Tibshirani, M. van de Rijn, N. Teng, T. Longacre, D. Botstein, P. Brown, B. Sikic

We used DNA microarrays to characterize the global gene expression patterns in surface epithelial cancers of the ovary. We identified groups of genes that distinguished the clear cell subtype from other ovarian carcinomas, grade I and II from grade III serous papillary carcinomas, and ovarian from breast carcinomas. Six clear cell carcinomas were distinguished from 36 other ovarian carcinomas (predominantly serous papillary) based on their gene expression patterns. The differences may yield insights into the worse prognosis and therapeutic resistance associated with clear cell carcinomas. A comparison of the gene expression patterns in the ovarian cancers to published data of gene expression in breast cancers revealed a large number of differentially expressed genes. We identified a group of 62 genes that correctly classified all 125 breast and ovarian cancer specimens. Among the best discriminators more highly expressed in the ovarian carcinomas were PAX8 (paired box gene 8), mesothelin, and ephrin-B1 (EFNB1). Although estrogen receptor was expressed in both the ovarian and breast cancers, genes that are coregulated with the estrogen receptor in breast cancers, including GATA-3, LIV-1, and X-box binding protein 1, did not show a similar pattern of coexpression in the ovarian cancers.

Show Abstract

Systemic and Cell Type-Specific Gene Expression Patterns in Scleroderma Skin

M. Whitfield, D. Finlay, J. Isaac Murray, O. Troyanskaya, J-T. Chi, A. Pergamenschikov, T. McCalmont, P. Brown, D. Botstein, M. Kari Connolly

We used DNA microarrays representing >12,000 human genes to characterize gene expression patterns in skin biopsies from individuals with a diagnosis of systemic sclerosis with diffuse scleroderma. We found consistent differences in the patterns of gene expression between skin biopsies from individuals with scleroderma and those from normal, unaffected individuals. The biopsies from affected individuals showed nearly indistinguishable patterns of gene expression in clinically affected and clinically unaffected tissue, even though these were clearly distinguishable from the patterns found in similar tissue from unaffected individuals. Genes characteristically expressed in endothelial cells, B lymphocytes, and fibroblasts showed differential expression between scleroderma and normal biopsies. Analysis of lymphocyte populations in scleroderma skin biopsies by immunohistochemistry suggest the B lymphocyte signature observed on our arrays is from CD20+ B cells. These results provide evidence that scleroderma has systemic manifestations that affect multiple cell types and suggests genes that could be used as potential markers for the disease.

Show Abstract

Variation in Gene Expression Patterns in Human Gastric Cancers

X. Chen, S. Leung, S. Yuen, K-M. Chu, J. Ji, R. Li, A. Chan, S. Law, O. Troyanskaya, J. Wong, S. So, D. Botstein, P. Brown

Gastric cancer is the world's second most common cause of cancer death. We analyzed gene expression patterns in 90 primary gastric cancers, 14 metastatic gastric cancers, and 22 nonneoplastic gastric tissues, using cDNA microarrays representing ∼30,300 genes. Gastric cancers were distinguished from nonneoplastic gastric tissues by characteristic differences in their gene expression patterns. We found a diversity of gene expression patterns in gastric cancer, reflecting variation in intrinsic properties of tumor and normal cells and variation in the cellular composition of these complex tissues. We identified several genes whose expression levels were significantly correlated with patient survival. The variations in gene expression patterns among cancers in different patients suggest differences in pathogenetic pathways and potential therapeutic strategies.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates