698 Publications

Mapping Dynamic Histone Acetylation Patterns to Gene Expression in Nanog-Depleted Murine Embryonic Stem Cells

F. Markowetz, K. Mulder, E. Airoldi, I. Lemischka, O. Troyanskaya

Embryonic stem cells (ESC) have the potential to self-renew indefinitely and to differentiate into any of the three germ layers. The molecular mechanisms for self-renewal, maintenance of pluripotency and lineage specification are poorly understood, but recent results point to a key role for epigenetic mechanisms. In this study, we focus on quantifying the impact of histone 3 acetylation (H3K9,14ac) on gene expression in murine embryonic stem cells. We analyze genome-wide histone acetylation patterns and gene expression profiles measured over the first five days of cell differentiation triggered by silencing Nanog, a key transcription factor in ESC regulation. We explore the temporal and spatial dynamics of histone acetylation data and its correlation with gene expression using supervised and unsupervised statistical models. On a genome-wide scale, changes in acetylation are significantly correlated to changes in mRNA expression and, surprisingly, this coherence increases over time. We quantify the predictive power of histone acetylation for gene expression changes in a balanced cross-validation procedure. In an in-depth study we focus on genes central to the regulatory network of Mouse ESC, including those identified in a recent genome-wide RNAi screen and in the PluriNet, a computationally derived stem cell signature. We find that compared to the rest of the genome, ESC-specific genes show significantly more acetylation signal and a much stronger decrease in acetylation over time, which is often not reflected in a concordant expression change. These results shed light on the complexity of the relationship between histone acetylation and gene expression and are a step forward to dissect the multilayer regulatory mechanisms that determine stem cell fate.

Show Abstract

Simultaneous Genome-Wide Inference of Physical, Genetic, Regulatory, and Functional Pathway Components

C. Park, D. Hess, C. Huttenhower, O. Troyanskaya

Biomolecular pathways are built from diverse types of pairwise interactions, ranging from physical protein-protein interactions and modifications to indirect regulatory relationships. One goal of systems biology is to bridge three aspects of this complexity: the growing body of high-throughput data assaying these interactions; the specific interactions in which individual genes participate; and the genome-wide patterns of interactions in a system of interest. Here, we describe methodology for simultaneously predicting specific types of biomolecular interactions using high-throughput genomic data. This results in a comprehensive compendium of whole-genome networks for yeast, derived from ∼3,500 experimental conditions and describing 30 interaction types, which range from general (e.g. physical or regulatory) to specific (e.g. phosphorylation or transcriptional regulation). We used these networks to investigate molecular pathways in carbon metabolism and cellular transport, proposing a novel connection between glycogen breakdown and glucose utilization supported by recent publications. Additionally, 14 specific predicted interactions in DNA topological change and protein biosynthesis were experimentally validated. We analyzed the systems-level network features within all interactomes, verifying the presence of small-world properties and enrichment for recurring network motifs. This compendium of physical, synthetic, regulatory, and functional interaction networks has been made publicly available through an interactive web interface for investigators to utilize in future research at http://function.princeton.edu/bioweaver/.

Show Abstract

Functional Genomics Complements Quantitative Genetics in Identifying Disease-Gene Associations

Y. Guan, C. Ackert-Bicknell, B. Kell, O. Troyanskaya

An ultimate goal of genetic research is to understand the connection between genotype and phenotype in order to improve the diagnosis and treatment of diseases. The quantitative genetics field has developed a suite of statistical methods to associate genetic loci with diseases and phenotypes, including quantitative trait loci (QTL) linkage mapping and genome-wide association studies (GWAS). However, each of these approaches have technical and biological shortcomings. For example, the amount of heritable variation explained by GWAS is often surprisingly small and the resolution of many QTL linkage mapping studies is poor. The predictive power and interpretation of QTL and GWAS results are consequently limited. In this study, we propose a complementary approach to quantitative genetics by interrogating the vast amount of high-throughput genomic data in model organisms to functionally associate genes with phenotypes and diseases. Our algorithm combines the genome-wide functional relationship network for the laboratory mouse and a state-of-the-art machine learning method. We demonstrate the superior accuracy of this algorithm through predicting genes associated with each of 1157 diverse phenotype ontology terms. Comparison between our prediction results and a meta-analysis of quantitative genetic studies reveals both overlapping candidates and distinct, accurate predictions uniquely identified by our approach. Focusing on bone mineral density (BMD), a phenotype related to osteoporotic fracture, we experimentally validated two of our novel predictions (not observed in any previous GWAS/QTL studies) and found significant bone density defects for both Timp2 and Abcg8 deficient mice. Our results suggest that the integration of functional genomics data into networks, which itself is informative of protein function and interactions, can successfully be utilized as a complementary approach to quantitative genetics to predict disease risks. All supplementary material is available at http://cbfg.jax.org/phenotype.

Show Abstract

Integrative Systems Biology for Data-Driven Knowledge Discovery

C. Greene, O. Troyanskaya

Integrative systems biology is an approach that brings together diverse high-throughput experiments and databases to gain new insights into biological processes or systems at molecular through physiological levels. These approaches rely on diverse high-throughput experimental techniques that generate heterogeneous data by assaying varying aspects of complex biological processes. Computational approaches are necessary to provide an integrative view of these experimental results and enable data-driven knowledge discovery. Hypotheses generated from these approaches can direct definitive molecular experiments in a cost-effective manner. By using integrative systems biology approaches, we can leverage existing biological knowledge and large-scale data to improve our understanding of as yet unknown components of a system of interest and how its malfunction leads to disease.

Show Abstract

Quantitative Analysis of Fitness and Genetic Interactions in Yeast on a Genome Scale

A. Baryshnikova, M. Costanzo, Y. Kim, H. Ding, J. Koh, K. Toufighi, J-Y. Youn, Jiongwen Ou, B-J. San Luis, S. Bandyopadhyay, M. Hibbs, D. Hess , A-C. Gingras, G. Bader, O. Troyanskaya, G. Brown, B. Andrews, C. Boone, C. Myers

Global quantitative analysis of genetic interactions is a powerful approach for deciphering the roles of genes and mapping functional relationships among pathways. Using colony size as a proxy for fitness, we developed a method for measuring fitness-based genetic interactions from high-density arrays of yeast double mutants generated by synthetic genetic array (SGA) analysis. We identified several experimental sources of systematic variation and developed normalization strategies to obtain accurate single- and double-mutant fitness measurements, which rival the accuracy of other high-resolution studies. We applied the SGA score to examine the relationship between physical and genetic interaction networks, and we found that positive genetic interactions connect across functionally distinct protein complexes revealing a network of genetic suppression among loss-of-function alleles.

Show Abstract

Systematic Planning of Genome-Scale Experiments in Poorly Studied Species

Y. Guan, M. Dunham, A. Caudy, O. Troyanskaya

Genome-scale datasets have been used extensively in model organisms to screen for specific candidates or to predict functions for uncharacterized genes. However, despite the availability of extensive knowledge in model organisms, the planning of genome-scale experiments in poorly studied species is still based on the intuition of experts or heuristic trials. We propose that computational and systematic approaches can be applied to drive the experiment planning process in poorly studied species based on available data and knowledge in closely related model organisms. In this paper, we suggest a computational strategy for recommending genome-scale experiments based on their capability to interrogate diverse biological processes to enable protein function assignment. To this end, we use the data-rich functional genomics compendium of the model organism to quantify the accuracy of each dataset in predicting each specific biological process and the overlap in such coverage between different datasets. Our approach uses an optimized combination of these quantifications to recommend an ordered list of experiments for accurately annotating most proteins in the poorly studied related organisms to most biological processes, as well as a set of experiments that target each specific biological process. The effectiveness of this experiment- planning system is demonstrated for two related yeast species: the model organism Saccharomyces cerevisiae and the comparatively poorly studied Saccharomyces bayanus. Our system recommended a set of S. bayanus experiments based on an S. cerevisiae microarray data compendium. In silico evaluations estimate that less than 10% of the experiments could achieve similar functional coverage to the whole microarray compendium. This estimation was confirmed by performing the recommended experiments in S. bayanus, therefore significantly reducing the labor devoted to characterize the poorly studied genome. This experiment-planning framework could readily be adapted to the design of other types of large-scale experiments as well as other groups of organisms.

Show Abstract

The Genetic Landscape of a Cell

M. Costanzo, A. Baryshnikova , J. Bellay , Y. Kim, E. Spear , C. Sevier , H. Ding , J. Koh , K. Toufighi, S. Mostafavi , J. Prinz , R. St Onge , Benjamin VanderSluis, Ph.D., T. Makhnevych , F. Vizeacoumar , S. Alizadeh , S. Bahr , R. Brost , Y. Chen , M. Cokol , R. Deshpande , Z. Li , Z. Lin, W. Liang , M. Marback , J. Paw , B. San Luis , E. Shuteriqi , A. Tong, N. van Dyk , I. Wallace, J. Whitney, M. Weirauch , G. Zhong , H. Zhu, W. A. Houry, M. Brudno, S. Ragibizadeh, B. Papp, C. Pál, F. Roth, G. Giaever, C. Nislow, O. Troyanskaya, H. Bussey, G. Bader, A-C. Gingras, Q. Morris, P. Kim, C. Kaiser, C. Myers, B. Andrews, C. Boone

A genome-scale genetic interaction map was constructed by examining 5.4 million gene-gene pairs for synthetic genetic interactions, generating quantitative genetic interaction profiles for approximately 75% of all genes in the budding yeast, Saccharomyces cerevisiae. A network based on genetic interaction profiles reveals a functional map of the cell in which genes of similar biological processes cluster together in coherent subsets, and highly correlated profiles delineate specific pathways to define gene function. The global network identifies functional cross-connections between all bioprocesses, mapping a cellular wiring diagram of pleiotropy. Genetic interaction degree correlated with a number of different gene attributes, which may be informative about genetic network hubs in other organisms. We also demonstrate that extensive and unbiased mapping of the genetic landscape provides a key for interpretation of chemical-genetic interactions and drug target identification.

Show Abstract

Detailing Regulatory Networks through Large Scale Data Integration

C. Huttenhower, K. Mutungu, N. Indik, W. Yang, M. Schroeder, J. Forman, O. Troyanskaya, H. Coller

MOTIVATION:
Much of a cell's regulatory response to changing environments occurs at the transcriptional level. Particularly in higher organisms, transcription factors (TFs), microRNAs and epigenetic modifications can combine to form a complex regulatory network. Part of this system can be modeled as a collection of regulatory modules: co-regulated genes, the conditions under which they are co-regulated and sequence-level regulatory motifs.

RESULTS:
We present the Combinatorial Algorithm for Expression and Sequence-based Cluster Extraction (COALESCE) system for regulatory module prediction. The algorithm is efficient enough to discover expression biclusters and putative regulatory motifs in metazoan genomes (>20,000 genes) and very large microarray compendia (>10,000 conditions). Using Bayesian data integration, it can also include diverse supporting data types such as evolutionary conservation or nucleosome placement. We validate its performance using a functional evaluation of co-clustered genes, known yeast and Escherichea coli TF targets, synthetic data and various metazoan data compendia. In all cases, COALESCE performs as well or better than current biclustering and motif prediction tools, with high accuracy in functional and TF/target assignments and zero false positives on synthetic data. COALESCE provides an efficient and flexible platform within which large, diverse data collections can be integrated to predict metazoan regulatory networks.

AVAILABILITY:
Source code (C++) is available at http://function.princeton.edu/sleipnir, and supporting data and a web interface are provided at http://function.princeton.edu/coalesce.

Show Abstract
December 15, 2009

Graphle: Interactive Exploration of Large, Dense Graphs

C. Huttenhower, S. Mehmood, O. Troyanskaya

BACKGROUND:
A wide variety of biological data can be modeled as network structures, including experimental results (e.g. protein-protein interactions), computational predictions (e.g. functional interaction networks), or curated structures (e.g. the Gene Ontology). While several tools exist for visualizing large graphs at a global level or small graphs in detail, previous systems have generally not allowed interactive analysis of dense networks containing thousands of vertices at a level of detail useful for biologists. Investigators often wish to explore specific portions of such networks from a detailed, gene-specific perspective, and balancing this requirement with the networks' large size, complex structure, and rich metadata is a substantial computational challenge.

RESULTS:
Graphle is an online interface to large collections of arbitrary undirected, weighted graphs, each possibly containing tens of thousands of vertices (e.g. genes) and hundreds of millions of edges (e.g. interactions). These are stored on a centralized server and accessed efficiently through an interactive Java applet. The Graphle applet allows a user to examine specific portions of a graph, retrieving the relevant neighborhood around a set of query vertices (genes). This neighborhood can then be refined and modified interactively, and the results can be saved either as publication-quality images or as raw data for further analysis. The Graphle web site currently includes several hundred biological networks representing predicted functional relationships from three heterogeneous data integration systems: S. cerevisiae data from bioPIXIE, E. coli data using MEFIT, and H. sapiens data from HEFalMp.

CONCLUSIONS:
Graphle serves as a search and visualization engine for biological networks, which can be managed locally (simplifying collaborative data sharing) and investigated remotely. The Graphle framework is freely downloadable and easily installed on new servers, allowing any lab to quickly set up a Graphle site from which their own biological network data can be shared online.

Show Abstract
December 14, 2009

Systems-Level Dynamic Analyses of Fate Change in Murine Embryonic Stem Cells

R. Lu, F. Markowetz, R. Unwin , J. Leek , E. Airoldi , B. MacArthur , A. Lachmann , R. Rozov , A. Ma'ayan , L. Boyer , O. Troyanskaya, A. Whetton, I. Lemischka

Molecular regulation of embryonic stem cell (ESC) fate involves a coordinated interaction between epigenetic, transcriptional and translational mechanisms. It is unclear how these different molecular regulatory mechanisms interact to regulate changes in stem cell fate. Here we present a dynamic systems-level study of cell fate change in murine ESCs following a well-defined perturbation. Global changes in histone acetylation, chromatin-bound RNA polymerase II, messenger RNA (mRNA), and nuclear protein levels were measured over 5 days after downregulation of Nanog, a key pluripotency regulator. Our data demonstrate how a single genetic perturbation leads to progressive widespread changes in several molecular regulatory layers, and provide a dynamic view of information flow in the epigenome, transcriptome and proteome. We observe that a large proportion of changes in nuclear protein levels are not accompanied by concordant changes in the expression of corresponding mRNAs, indicating important roles for translational and post-translational regulation of ESC fate. Gene-ontology analysis across different molecular layers indicates that although chromatin reconfiguration is important for altering cell fate, it is preceded by transcription-factor-mediated regulatory events. The temporal order of gene expression alterations shows the order of the regulatory network reconfiguration and offers further insight into the gene regulatory network. Our studies extend the conventional systems biology approach to include many molecular species, regulatory layers and temporal series, and underscore the complexity of the multilayer regulatory mechanisms responsible for changes in protein expression that determine stem cell fate.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates