645 Publications

IMP: A Multi-Species Functional Genomics Portal for Integration, Visualization and Prediction of Protein Functions and Networks

A. Wong, C. Park, C. Greene, L. Bongo, Y. Guan, O. Troyanskaya

Integrative multi-species prediction (IMP) is an interactive web server that enables molecular biologists to interpret experimental results and to generate hypotheses in the context of a large cross-organism compendium of functional predictions and networks. The system provides a framework for biologists to analyze their candidate gene sets in the context of functional networks, as they expand or focus these sets by mining functional relationships predicted from integrated high-throughput data. IMP integrates prior knowledge and data collections from multiple organisms in its analyses. Through flexible and interactive visualizations, researchers can compare functional contexts and interpret the behavior of their gene sets across organisms. Additionally, IMP identifies homologs with conserved functional roles for knowledge transfer, allowing for accurate function predictions even for biological processes that have very few experimental annotations in a given organism. IMP currently supports seven organisms (Homo sapiens, Mus musculus, Rattus novegicus, Drosophila melanogaster, Danio rerio, Caenorhabditis elegans and Saccharomyces cerevisiae), does not require any registration or installation and is freely available for use at http://imp.princeton.edu.

Show Abstract

An Effective Statistical Evaluation of ChIPseq Dataset Similarity

M. Chikina, O. Troyanskaya

MOTIVATION:
ChIPseq is rapidly becoming a common technique for investigating protein-DNA interactions. However, results from individual experiments provide a limited understanding of chromatin structure, as various chromatin factors cooperate in complex ways to orchestrate transcription. In order to quantify chromtain interactions, it is thus necessary to devise a robust similarity metric applicable to ChIPseq data. Unfortunately, moving past simple overlap calculations to give statistically rigorous comparisons of ChIPseq datasets often involves arbitrary choices of distance metrics, with significance being estimated by computationally intensive permutation tests whose statistical power may be sensitive to non-biological experimental and post-processing variation.
RESULTS:
We show that it is in fact possible to compare ChIPseq datasets through the efficient computation of exact P-values for proximity. Our method is insensitive to non-biological variation in datasets such as peak width, and can rigorously model peak location biases by evaluating similarity conditioned on a restricted set of genomic regions (such as mappable genome or promoter regions). Applying our method to the well-studied dataset of Chen et al. (2008), we elucidate novel interactions which conform well with our biological understanding. By comparing ChIPseq data in an asymmetric way, we are able to observe clear interaction differences between cofactors such as p300 and factors that bind DNA directly.
AVAILABILITY:
Source code is available for download at http://sonorus.princeton.edu/IntervalStats/IntervalStats.tar.gz.

Show Abstract

Integrated Molecular Profiles of Invasive Breast Tumors and Ductal Carcinoma in Situ (DCIS) Reveal Differential Vascular and Interleukin Signaling

V. Kristensen, C. Vaske , C. Ursini-Siegel , P. Van Loo , S. Nordgard , R. Sachidanandam , T. Sørlie , F. Wärnberg , V. Haakensen , Å. Helland , B. Naume, C. Perou , D. Haussler , O. Troyanskaya, A. Børresen-Dale

We use an integrated approach to understand breast cancer heterogeneity by modeling mRNA, copy number alterations, microRNAs, and methylation in a pathway context utilizing the pathway recognition algorithm using data integration on genomic models (PARADIGM). We demonstrate that combining mRNA expression and DNA copy number classified the patients in groups that provide the best predictive value with respect to prognosis and identified key molecular and stromal signatures. A chronic inflammatory signature, which promotes the development and/or progression of various epithelial tumors, is uniformly present in all breast cancers. We further demonstrate that within the adaptive immune lineage, the strongest predictor of good outcome is the acquisition of a gene signature that favors a high T-helper 1 (Th1)/cytotoxic T-lymphocyte response at the expense of Th2-driven humoral immunity. Patients who have breast cancer with a basal HER2-negative molecular profile (PDGM2) are characterized by high expression of protumorigenic Th2/humoral-related genes (24-38%) and a low Th1/Th2 ratio. The luminal molecular subtypes are again differentiated by low or high FOXM1 and ERBB4 signaling. We show that the interleukin signaling profiles observed in invasive cancers are absent or weakly expressed in healthy tissue but already prominent in ductal carcinoma in situ, together with ECM and cell-cell adhesion regulating pathways. The most prominent difference between low and high mammographic density in healthy breast tissue by PARADIGM was that of STAT4 signaling. In conclusion, by means of a pathway-based modeling methodology (PARADIGM) integrating different layers of molecular data from whole-tumor samples, we demonstrate that we can stratify immune signatures that predict patient survival.

Show Abstract

Nucleosome-Coupled Expression Differences in Closely-Related Species

Y. Guan, V. Yao, K. Tsui, M. Gebbia , M. Dunham, O. Troyanskaya

BACKGROUND:
Genome-wide nucleosome occupancy is negatively related to the average level of transcription factor motif binding based on studies in yeast and several other model organisms. The degree to which nucleosome-motif interactions relate to phenotypic changes across species is, however, unknown.
RESULTS:
We address this challenge by generating nucleosome positioning and cell cycle expression data for Saccharomyces bayanus and show that differences in nucleosome occupancy reflect cell cycle expression divergence between two yeast species, S. bayanus and S. cerevisiae. Specifically, genes with nucleosome-depleted MBP1 motifs upstream of their coding sequence show periodic expression during the cell cycle, whereas genes with nucleosome-shielded motifs do not. In addition, conserved cell cycle regulatory motifs across these two species are more nucleosome-depleted compared to those that are not conserved, suggesting that the degree of conservation of regulatory sites varies, and is reflected by nucleosome occupancy patterns. Finally, many changes in cell cycle gene expression patterns across species can be correlated to changes in nucleosome occupancy on motifs (rather than to the presence or absence of motifs).
CONCLUSIONS:
Our observations suggest that alteration of nucleosome occupancy is a previously uncharacterized feature related to the divergence of cell cycle expression between species.

Show Abstract
September 26, 2011

PILGRM: An Interactive Data-Driven Discovery Platform for Expert Biologists

C. Greene, O. Troyanskaya

PILGRM (the platform for interactive learning by genomics results mining) puts advanced supervised analysis techniques applied to enormous gene expression compendia into the hands of bench biologists. This flexible system empowers its users to answer diverse biological questions that are often outside of the scope of common databases in a data-driven manner. This capability allows domain experts to quickly and easily generate hypotheses about biological processes, tissues or diseases of interest. Specifically PILGRM helps biologists generate these hypotheses by analyzing the expression levels of known relevant genes in large compendia of microarray data. Because PILGRM is data-driven, it complements a user’s knowledge and literature analysis with mining of diverse functional genomic data, thereby generating novel predictions that can drive experimental follow-up. This server is free, does not require registration and is available for use at http://pilgrm.princeton.edu.

Show Abstract

Accurate Quantification of Functional Analogy among Close Homologs

M. Chikina, O. Troyanskaya

Correctly evaluating functional similarities among homologous proteins is necessary for accurate transfer of experimental knowledge from one organism to another, and is of particular importance for the development of animal models of human disease. While the fact that sequence similarity implies functional similarity is a fundamental paradigm of molecular biology, sequence comparison does not directly assess the extent to which two proteins participate in the same biological processes, and has limited utility for analyzing families with several parologous members. Nevertheless, we show that it is possible to provide a cross-organism functional similarity measure in an unbiased way through the exclusive use of high-throughput gene-expression data. Our methodology is based on probabilistic cross-species mapping of functionally analogous proteins based on Bayesian integrative analysis of gene expression compendia. We demonstrate that even among closely related genes, our method is able to predict functionally analogous homolog pairs better than relying on sequence comparison alone. We also demonstrate that the landscape of functional similarity is often complex and that definitive “functional orthologs” do not always exist. Even in these cases, our method and the online interface we provide are designed to allow detailed exploration of sources of inferred functional similarity that can be evaluated by the user.

Show Abstract

Integrated Functional Networks of Process, Tissue, and Developmental Stage Specific Interactions in Arabidopsis Thaliana

A. Pop, C. Huttenhower, A. Iyer-Pascuzzi, P. Benfey, O. Troyanskaya

Background
Recent years have seen an explosion in plant genomics, as the difficulties inherent in sequencing and functionally analyzing these biologically and economically significant organisms have been overcome. Arabidopsis thaliana, a versatile model organism, represents an opportunity to evaluate the predictive power of biological network inference for plant functional genomics.

Results
Here, we provide a compendium of functional relationship networks for Arabidopsis thaliana leveraging data integration based on over 60 microarray, physical and genetic interaction, and literature curation datasets. These include tissue, biological process, and development stage specific networks, each predicting relationships specific to an individual biological context. These biological networks enable the rapid investigation of uncharacterized genes in specific tissues and developmental stages of interest and summarize a very large collection of A. thaliana data for biological examination. We found validation in the literature for many of our predicted networks, including those involved in disease resistance, root hair patterning, and auxin homeostasis.

Conclusions
These context-specific networks demonstrate that highly specific biological hypotheses can be generated for a diversity of individual processes, developmental stages, and plant tissues in A. thaliana. All predicted functional networks are available online at http://function.princeton.edu/arathGraphle.

Show Abstract

Mapping Dynamic Histone Acetylation Patterns to Gene Expression in Nanog-Depleted Murine Embryonic Stem Cells

F. Markowetz, K. Mulder, E. Airoldi, I. Lemischka, O. Troyanskaya

Embryonic stem cells (ESC) have the potential to self-renew indefinitely and to differentiate into any of the three germ layers. The molecular mechanisms for self-renewal, maintenance of pluripotency and lineage specification are poorly understood, but recent results point to a key role for epigenetic mechanisms. In this study, we focus on quantifying the impact of histone 3 acetylation (H3K9,14ac) on gene expression in murine embryonic stem cells. We analyze genome-wide histone acetylation patterns and gene expression profiles measured over the first five days of cell differentiation triggered by silencing Nanog, a key transcription factor in ESC regulation. We explore the temporal and spatial dynamics of histone acetylation data and its correlation with gene expression using supervised and unsupervised statistical models. On a genome-wide scale, changes in acetylation are significantly correlated to changes in mRNA expression and, surprisingly, this coherence increases over time. We quantify the predictive power of histone acetylation for gene expression changes in a balanced cross-validation procedure. In an in-depth study we focus on genes central to the regulatory network of Mouse ESC, including those identified in a recent genome-wide RNAi screen and in the PluriNet, a computationally derived stem cell signature. We find that compared to the rest of the genome, ESC-specific genes show significantly more acetylation signal and a much stronger decrease in acetylation over time, which is often not reflected in a concordant expression change. These results shed light on the complexity of the relationship between histone acetylation and gene expression and are a step forward to dissect the multilayer regulatory mechanisms that determine stem cell fate.

Show Abstract

Simultaneous Genome-Wide Inference of Physical, Genetic, Regulatory, and Functional Pathway Components

C. Park, D. Hess, C. Huttenhower, O. Troyanskaya

Biomolecular pathways are built from diverse types of pairwise interactions, ranging from physical protein-protein interactions and modifications to indirect regulatory relationships. One goal of systems biology is to bridge three aspects of this complexity: the growing body of high-throughput data assaying these interactions; the specific interactions in which individual genes participate; and the genome-wide patterns of interactions in a system of interest. Here, we describe methodology for simultaneously predicting specific types of biomolecular interactions using high-throughput genomic data. This results in a comprehensive compendium of whole-genome networks for yeast, derived from ∼3,500 experimental conditions and describing 30 interaction types, which range from general (e.g. physical or regulatory) to specific (e.g. phosphorylation or transcriptional regulation). We used these networks to investigate molecular pathways in carbon metabolism and cellular transport, proposing a novel connection between glycogen breakdown and glucose utilization supported by recent publications. Additionally, 14 specific predicted interactions in DNA topological change and protein biosynthesis were experimentally validated. We analyzed the systems-level network features within all interactomes, verifying the presence of small-world properties and enrichment for recurring network motifs. This compendium of physical, synthetic, regulatory, and functional interaction networks has been made publicly available through an interactive web interface for investigators to utilize in future research at http://function.princeton.edu/bioweaver/.

Show Abstract

Functional Genomics Complements Quantitative Genetics in Identifying Disease-Gene Associations

Y. Guan, C. Ackert-Bicknell, B. Kell, O. Troyanskaya

An ultimate goal of genetic research is to understand the connection between genotype and phenotype in order to improve the diagnosis and treatment of diseases. The quantitative genetics field has developed a suite of statistical methods to associate genetic loci with diseases and phenotypes, including quantitative trait loci (QTL) linkage mapping and genome-wide association studies (GWAS). However, each of these approaches have technical and biological shortcomings. For example, the amount of heritable variation explained by GWAS is often surprisingly small and the resolution of many QTL linkage mapping studies is poor. The predictive power and interpretation of QTL and GWAS results are consequently limited. In this study, we propose a complementary approach to quantitative genetics by interrogating the vast amount of high-throughput genomic data in model organisms to functionally associate genes with phenotypes and diseases. Our algorithm combines the genome-wide functional relationship network for the laboratory mouse and a state-of-the-art machine learning method. We demonstrate the superior accuracy of this algorithm through predicting genes associated with each of 1157 diverse phenotype ontology terms. Comparison between our prediction results and a meta-analysis of quantitative genetic studies reveals both overlapping candidates and distinct, accurate predictions uniquely identified by our approach. Focusing on bone mineral density (BMD), a phenotype related to osteoporotic fracture, we experimentally validated two of our novel predictions (not observed in any previous GWAS/QTL studies) and found significant bone density defects for both Timp2 and Abcg8 deficient mice. Our results suggest that the integration of functional genomics data into networks, which itself is informative of protein function and interactions, can successfully be utilized as a complementary approach to quantitative genetics to predict disease risks. All supplementary material is available at http://cbfg.jax.org/phenotype.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates

privacy consent banner

Privacy preference

We use cookies to provide you with the best online experience. By clicking "Accept All," you help us understand how our site is used and enhance its performance. You can change your choice at any time here. To learn more, please visit our Privacy Policy.