645 Publications

Coordination of Growth Rate, Cell Cycle, Stress Response, and Metabolic Activity in Yeast

M. Brauer, C. Huttenhower, E. Airoldi, R. Rosenstein, J. Matese, D. Gresham, V. Boer, O. Troyanskaya, D.Botstein

We studied the relationship between growth rate and genome-wide gene expression, cell cycle progression, and glucose metabolism in 36 steady-state continuous cultures limited by one of six different nutrients (glucose, ammonium, sulfate, phosphate, uracil, or leucine). The expression of more than one quarter of all yeast genes is linearly correlated with growth rate, independent of the limiting nutrient. The subset of negatively growth-correlated genes is most enriched for peroxisomal functions, whereas positively correlated genes mainly encode ribosomal functions. Many (not all) genes associated with stress response are strongly correlated with growth rate, as are genes that are periodically expressed under conditions of metabolic cycling. We confirmed a linear relationship between growth rate and the fraction of the cell population in the G0/G1 cell cycle phase, independent of limiting nutrient. Cultures limited by auxotrophic requirements wasted excess glucose, whereas those limited on phosphate, sulfate, or ammonia did not; this phenomenon (reminiscent of the “Warburg effect” in cancer cells) was confirmed in batch cultures. Using an aggregate of gene expression values, we predict (in both continuous and batch cultures) an “instantaneous growth rate.” This concept is useful in interpreting the system-level connections among growth rate, metabolism, stress, and the cell cycle.

Show Abstract

Predicting Gene Function in a Hierarchical Context with an Ensemble of Classifiers

Y. Guan, C. Myers, D. Hess, Z. Barutcuoglu, A. Caudy, O. Troyanskaya

BACKGROUND:
The wide availability of genome-scale data for several organisms has stimulated interest in computational approaches to gene function prediction. Diverse machine learning methods have been applied to unicellular organisms with some success, but few have been extensively tested on higher level, multicellular organisms. A recent mouse function prediction project (MouseFunc) brought together nine bioinformatics teams applying a diverse array of methodologies to mount the first large-scale effort to predict gene function in the laboratory mouse.

RESULTS:
In this paper, we describe our contribution to this project, an ensemble framework based on the support vector machine that integrates diverse datasets in the context of the Gene Ontology hierarchy. We carry out a detailed analysis of the performance of our ensemble and provide insights into which methods work best under a variety of prediction scenarios. In addition, we applied our method to Saccharomyces cerevisiae and have experimentally confirmed functions for a novel mitochondrial protein.

CONCLUSION:
Our method consistently performs among the top methods in the MouseFunc evaluation. Furthermore, it exhibits good classification performance across a variety of cellular processes and functions in both a multicellular organism and a unicellular organism, indicating its ability to discover novel biology in diverse settings.

Show Abstract

A Critical Assessment of Mus Musculus Gene Function Prediction Using Integrated Genomic Evidence

L. Peña-Castillo , M. Tasan , C. Myers , H. Lee, T. Joshi , C. Zhang , Y. Guan , M. Leone , A. Pagnani , W. Kim, C. Krumpelman , W. Tian , G. Obozinski, Y. Qi Y, S. Mostafavi , G. Lin , G. Berriz , F. Gibbons , G. Lanckriet, J. Qiu , C. Grant , Z. Barutcuoglu , D. Hill , D. Warde-Farley , C. Grouios , D. Ray, J. Blake , M. Deng , M. Jordan , W. Noble , Q. Morris, J. Klein-Seetharaman , Z. Bar-Joseph, T. Chen , F. Sun F, O. Troyanskaya, E. Marcotte , D. Xu , T. Hughes, F. Roth

BACKGROUND:
Several years after sequencing the human genome and the mouse genome, much remains to be discovered about the functions of most human and mouse genes. Computational prediction of gene function promises to help focus limited experimental resources on the most likely hypotheses. Several algorithms using diverse genomic data have been applied to this task in model organisms; however, the performance of such approaches in mammals has not yet been evaluated.
RESULTS:
In this study, a standardized collection of mouse functional genomic data was assembled; nine bioinformatics teams used this data set to independently train classifiers and generate predictions of function, as defined by Gene Ontology (GO) terms, for 21,603 mouse genes; and the best performing submissions were combined in a single set of predictions. We identified strengths and weaknesses of current functional genomic data sets and compared the performance of function prediction algorithms. This analysis inferred functions for 76% of mouse genes, including 5,000 currently uncharacterized genes. At a recall rate of 20%, a unified set of predictions averaged 41% precision, with 26% of GO terms achieving a precision better than 90%.
CONCLUSION:
We performed a systematic evaluation of diverse, independently developed computational approaches for predicting gene function from heterogeneous data sources in mammals. The results show that currently available data for mammals allows predictions with both breadth and accuracy. Importantly, many highly novel predictions emerge for the 38% of mouse genes that remain uncharacterized.

Show Abstract

Exploring the Functional Landscape of Gene Expression: Directed Search of Large Microarray Compendia

M. Hibbs, D. Hess, C. Myers, C. Huttenhower, K. Li, O. Troyanskaya

MOTIVATION:
The increasing availability of gene expression microarray technology has resulted in the publication of thousands of microarray gene expression datasets investigating various biological conditions. This vast repository is still underutilized due to the lack of methods for fast, accurate exploration of the entire compendium.

RESULTS:
We have collected Saccharomyces cerevisiae gene expression microarray data containing roughly 2400 experimental conditions. We analyzed the functional coverage of this collection and we designed a context-sensitive search algorithm for rapid exploration of the compendium. A researcher using our system provides a small set of query genes to establish a biological search context; based on this query, we weight each dataset's relevance to the context, and within these weighted datasets we identify additional genes that are co-expressed with the query set. Our method exhibits an average increase in accuracy of 273% compared to previous mega-clustering approaches when recapitulating known biology. Further, we find that our search paradigm identifies novel biological predictions that can be verified through further experimentation. Our methodology provides the ability for biological researchers to explore the totality of existing microarray data in a manner useful for drawing conclusions and formulating hypotheses, which we believe is invaluable for the research community.

AVAILABILITY:
Our query-driven search engine, called SPELL, is available at http://function.princeton.edu/SPELL.

SUPPLEMENTARY INFORMATION:
Several additional data files, figures and discussions are available at http://function.princeton.edu/SPELL/supplement.

Show Abstract

Context-Sensitive Data Integration and Prediction of Biological Networks

C. Myers, O. Troyanskaya

MOTIVATION:
Several recent methods have addressed the problem of heterogeneous data integration and network prediction by modeling the noise inherent in high-throughput genomic datasets, which can dramatically improve specificity and sensitivity and allow the robust integration of datasets with heterogeneous properties. However, experimental technologies capture different biological processes with varying degrees of success, and thus, each source of genomic data can vary in relevance depending on the biological process one is interested in predicting. Accounting for this variation can significantly improve network prediction, but to our knowledge, no previous approaches have explicitly leveraged this critical information about biological context.
RESULTS:
We confirm the presence of context-dependent variation in functional genomic data and propose a Bayesian approach for context-sensitive integration and query-based recovery of biological process-specific networks. By applying this method to Saccharomyces cerevisiae, we demonstrate that leveraging contextual information can significantly improve the precision of network predictions, including assignment for uncharacterized genes. We expect that this general context-sensitive approach can be applied to other organisms and prediction scenarios.
AVAILABILITY:
A software implementation of our approach is available on request from the authors.
SUPPLEMENTARY INFORMATION:
Supplementary data are available at http://avis.princeton.edu/contextPIXIE/

Show Abstract

Nearest Neighbor Networks: Clustering Expression Data Based on Gene Neighborhoods

C. Huttenhower, A. Flamholz, J. Landis, S. Sahi, C. Myers, K. Olszewski, M. Hibbs, N. Siemers, O. Troyanskaya, H. Coller

Background
The availability of microarrays measuring thousands of genes simultaneously across hundreds of biological conditions represents an opportunity to understand both individual biological pathways and the integrated workings of the cell. However, translating this amount of data into biological insight remains a daunting task. An important initial step in the analysis of microarray data is clustering of genes with similar behavior. A number of classical techniques are commonly used to perform this task, particularly hierarchical and K-means clustering, and many novel approaches have been suggested recently. While these approaches are useful, they are not without drawbacks; these methods can find clusters in purely random data, and even clusters enriched for biological functions can be skewed towards a small number of processes (e.g. ribosomes).

Results
We developed Nearest Neighbor Networks (NNN), a graph-based algorithm to generate clusters of genes with similar expression profiles. This method produces clusters based on overlapping cliques within an interaction network generated from mutual nearest neighborhoods. This focus on nearest neighbors rather than on absolute distance measures allows us to capture clusters with high connectivity even when they are spatially separated, and requiring mutual nearest neighbors allows genes with no sufficiently similar partners to remain unclustered. We compared the clusters generated by NNN with those generated by eight other clustering methods. NNN was particularly successful at generating functionally coherent clusters with high precision, and these clusters generally represented a much broader selection of biological processes than those recovered by other methods.

Conclusion
The Nearest Neighbor Networks algorithm is a valuable clustering method that effectively groups genes that are likely to be functionally related. It is particularly attractive due to its simplicity, its success in the analysis of large datasets, and its ability to span a wide range of biological functions with high precision.

Show Abstract
July 12, 2007

Nested Effects Models for High-Dimensional Phenotyping Screens

F. Markowetz , D. Kostka , O. Troyanskaya, R. Spang

Motivation: In high-dimensional phenotyping screens, a large number of cellular features is observed after perturbing genes by knockouts or RNA interference. Comprehensive analysis of perturbation effects is one of the most powerful techniques for attributing functions to genes, but not much work has been done so far to adapt statistical and computational methodology to the specific needs of large-scale and high-dimensional phenotyping screens.
Results: We introduce and compare probabilistic methods to efficiently infer a genetic hierarchy from the nested structure of observed perturbation effects. These hierarchies elucidate the structures of signaling pathways and regulatory networks. Our methods achieve two goals: (1) they reveal clusters of genes with highly similar phenotypic profiles, and (2) they order (clusters of) genes according to subset relationships between phenotypes. We evaluate our algorithms in the controlled setting of simulation studies and show their practical use in two experimental scenarios: (1) a data set investigating the response to microbial challenge in Drosophila melanogaster, and (2) a compendium of expression profiles of Saccharomyces cerevisiae knockout strains. We show that our methods identify biologically justified genetic hierarchies of perturbation effects.
Availability: The software used in our analysis is freely available in the R package ‘nem’ from www.bioconductor.org

Show Abstract

Computational Identification of Cellular Networks and Pathways

F. Markowetz, O. Troyanskaya

In this article we highlight recent developments in computational functional genomics to identify networks of functionally related genes and proteins based on diverse sources of genomic data. Our specific focus is on statistical methods to identify genetic networks. We discuss integrated analysis of microarray datasets, methods to combine heterogeneous data sources, the analysis of high-dimensional phenotyping screens and describe efforts to establish a reliable and unbiased gold standard for method comparison and evaluation.

Show Abstract
May 14, 2007
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates

privacy consent banner

Privacy preference

We use cookies to provide you with the best online experience. By clicking "Accept All," you help us understand how our site is used and enhance its performance. You can change your choice at any time here. To learn more, please visit our Privacy Policy.