661 Publications

Context-Sensitive Data Integration and Prediction of Biological Networks

C. Myers, O. Troyanskaya

MOTIVATION:
Several recent methods have addressed the problem of heterogeneous data integration and network prediction by modeling the noise inherent in high-throughput genomic datasets, which can dramatically improve specificity and sensitivity and allow the robust integration of datasets with heterogeneous properties. However, experimental technologies capture different biological processes with varying degrees of success, and thus, each source of genomic data can vary in relevance depending on the biological process one is interested in predicting. Accounting for this variation can significantly improve network prediction, but to our knowledge, no previous approaches have explicitly leveraged this critical information about biological context.
RESULTS:
We confirm the presence of context-dependent variation in functional genomic data and propose a Bayesian approach for context-sensitive integration and query-based recovery of biological process-specific networks. By applying this method to Saccharomyces cerevisiae, we demonstrate that leveraging contextual information can significantly improve the precision of network predictions, including assignment for uncharacterized genes. We expect that this general context-sensitive approach can be applied to other organisms and prediction scenarios.
AVAILABILITY:
A software implementation of our approach is available on request from the authors.
SUPPLEMENTARY INFORMATION:
Supplementary data are available at http://avis.princeton.edu/contextPIXIE/

Show Abstract

Nearest Neighbor Networks: Clustering Expression Data Based on Gene Neighborhoods

C. Huttenhower, A. Flamholz, J. Landis, S. Sahi, C. Myers, K. Olszewski, M. Hibbs, N. Siemers, O. Troyanskaya, H. Coller

Background
The availability of microarrays measuring thousands of genes simultaneously across hundreds of biological conditions represents an opportunity to understand both individual biological pathways and the integrated workings of the cell. However, translating this amount of data into biological insight remains a daunting task. An important initial step in the analysis of microarray data is clustering of genes with similar behavior. A number of classical techniques are commonly used to perform this task, particularly hierarchical and K-means clustering, and many novel approaches have been suggested recently. While these approaches are useful, they are not without drawbacks; these methods can find clusters in purely random data, and even clusters enriched for biological functions can be skewed towards a small number of processes (e.g. ribosomes).

Results
We developed Nearest Neighbor Networks (NNN), a graph-based algorithm to generate clusters of genes with similar expression profiles. This method produces clusters based on overlapping cliques within an interaction network generated from mutual nearest neighborhoods. This focus on nearest neighbors rather than on absolute distance measures allows us to capture clusters with high connectivity even when they are spatially separated, and requiring mutual nearest neighbors allows genes with no sufficiently similar partners to remain unclustered. We compared the clusters generated by NNN with those generated by eight other clustering methods. NNN was particularly successful at generating functionally coherent clusters with high precision, and these clusters generally represented a much broader selection of biological processes than those recovered by other methods.

Conclusion
The Nearest Neighbor Networks algorithm is a valuable clustering method that effectively groups genes that are likely to be functionally related. It is particularly attractive due to its simplicity, its success in the analysis of large datasets, and its ability to span a wide range of biological functions with high precision.

Show Abstract
July 12, 2007

Nested Effects Models for High-Dimensional Phenotyping Screens

F. Markowetz , D. Kostka , O. Troyanskaya, R. Spang

Motivation: In high-dimensional phenotyping screens, a large number of cellular features is observed after perturbing genes by knockouts or RNA interference. Comprehensive analysis of perturbation effects is one of the most powerful techniques for attributing functions to genes, but not much work has been done so far to adapt statistical and computational methodology to the specific needs of large-scale and high-dimensional phenotyping screens.
Results: We introduce and compare probabilistic methods to efficiently infer a genetic hierarchy from the nested structure of observed perturbation effects. These hierarchies elucidate the structures of signaling pathways and regulatory networks. Our methods achieve two goals: (1) they reveal clusters of genes with highly similar phenotypic profiles, and (2) they order (clusters of) genes according to subset relationships between phenotypes. We evaluate our algorithms in the controlled setting of simulation studies and show their practical use in two experimental scenarios: (1) a data set investigating the response to microbial challenge in Drosophila melanogaster, and (2) a compendium of expression profiles of Saccharomyces cerevisiae knockout strains. We show that our methods identify biologically justified genetic hierarchies of perturbation effects.
Availability: The software used in our analysis is freely available in the R package ‘nem’ from www.bioconductor.org

Show Abstract

Computational Identification of Cellular Networks and Pathways

F. Markowetz, O. Troyanskaya

In this article we highlight recent developments in computational functional genomics to identify networks of functionally related genes and proteins based on diverse sources of genomic data. Our specific focus is on statistical methods to identify genetic networks. We discuss integrated analysis of microarray datasets, methods to combine heterogeneous data sources, the analysis of high-dimensional phenotyping screens and describe efforts to establish a reliable and unbiased gold standard for method comparison and evaluation.

Show Abstract
May 14, 2007

Functional Analysis of Gene Duplications in Saccharomyces Cerevisiae

Y. Guan, M. Dunham, O. Troyanskaya

Gene duplication can occur on two scales: whole-genome duplications (WGD) and smaller-scale duplications (SSD) involving individual genes or genomic segments. Duplication may result in functionally redundant genes or diverge in function through neofunctionalization or subfunctionalization. The effect of duplication scale on functional evolution has not yet been explored, probably due to the lack of global knowledge of protein function and different times of duplication events. To address this question, we used integrated Bayesian analysis of diverse functional genomic data to accurately evaluate the extent of functional similarity and divergence between paralogs on a global scale. We found that paralogs resulting from the whole-genome duplication are more likely to share interaction partners and biological functions than smaller-scale duplicates, independent of sequence similarity. In addition, WGD paralogs show lower frequency of essential genes and higher synthetic lethality rate, but instead diverge more in expression pattern and upstream regulatory region. Thus, our analysis demonstrates that WGD paralogs generally have similar compensatory functions but diverging expression patterns, suggesting a potential of distinct evolutionary scenarios for paralogs that arose through different duplication mechanisms. Furthermore, by identifying these functional disparities between the two types of duplicates, we reconcile previous disputes on the relationship between sequence divergence and expression divergence or essentiality.

Show Abstract
Genetics
2007

Modeling Complex Genetic Interactions in a Simple Eukaryotic Genome: Actin Displays a Rich Spectrum of Complex Haploinsufficiencies

B. Haarer, S. Viggiano, M. Hibbs, O. Troyanskaya, D. Amberg

Multigenic influences are major contributors to human genetic disorders. Since humans are highly polymorphic, there are a high number of possible detrimental, multiallelic gene pairs. The actin cytoskeleton of yeast was used to determine the potential for deleterious bigenic interactions; approximately 4800 complex hemizygote strains were constructed between an actin-null allele and the nonessential gene deletion collection. We found 208 genes that have deleterious complex haploinsufficient (CHI) interactions with actin. This set is enriched for genes with gene ontology terms shared with actin, including several actin-binding protein genes, and nearly half of the CHI genes have defects in actin organization when deleted. Interactions were frequently seen with genes for multiple components of a complex or with genes involved in the same function. For example, many of the genes for the large ribosomal subunit (RPLs) were CHI with act1Delta and had actin organization defects when deleted. This was generally true of only one RPL paralog of apparently duplicate genes, suggesting functional specialization between ribosomal genes. In many cases, CHI interactions could be attributed to localized defects on the actin protein. Spatial congruence in these data suggest that the loss of binding to specific actin-binding proteins causes subsets of CHI interactions.

Show Abstract

Integrated Analysis of Microarray Results

Gene expression microarrays are becoming increasingly widespread, especially as a way to rapidly identify putative functions of unknown genes. Accurate microarray data analysis, however, still remains a challenge. The recent availability of multiple types of high-throughput functional genomic data can facilitate accurate and effective analysis of microarray experiments and thereby accelerate functional annotation of sequenced genomes. But genomic data often sacrifice specificity for scale, yielding very large quantities of relatively lower quality data than traditional experimental methods. Advanced analysis methods are thus necessary to make accurate functional interpretation of these large-scale datasets. This chapter outlines recently developed methods that integrate the analysis of microarray data with sequence, interaction, localization, and literature data and further outlines specific problems in currently available integrated analysis technologies.

Show Abstract

A Scalable Method for Integration and Functional Analysis of Multiple Microarray Datasets

C. Huttenhower, M. Hibbs, C. Myers, O. Troyanskaya

Motivation: The diverse microarray datasets that have become available over the past several years represent a rich opportunity and challenge for biological data mining. Many supervised and unsupervised methods have been developed for the analysis of individual microarray datasets. However, integrated analysis of multiple datasets can provide a broader insight into genetic regulation of specific biological pathways under a variety of conditions.
Results: To aid in the analysis of such large compendia of microarray experiments, we present Microarray Experiment Functional Integration Technology (MEFIT), a scalable Bayesian framework for predicting functional relationships from integrated microarray datasets. Furthermore, MEFIT predicts these functional relationships within the context of specific biological processes. All results are provided in the context of one or more specific biological functions, which can be provided by a biologist or drawn automatically from catalogs such as the Gene Ontology (GO). Using MEFIT, we integrated 40 Saccharomyces cerevisiae microarray datasets spanning 712 unique conditions. In tests based on 110 biological functions drawn from the GO biological process ontology, MEFIT provided a 5% or greater performance increase for 54 functions, with a 5% or more decrease in performance in only two functions.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates

privacy consent banner

Privacy preference

We use cookies to provide you with the best online experience. By clicking "Accept All," you help us understand how our site is used and enhance its performance. You can change your choice at any time here. To learn more, please visit our Privacy Policy.