162 Publications

A New System for Comparative Functional Genomics of Saccharomyces Yeasts

A. Caudy, Yuanfang Guan, Y. Jia, C. Hansen, C. DeSevo, A.P. Hayes, J. Agee, J.R. Alvarez-Dominguez, H. Arellano, D. Barrett, C. Bauerle, N. Bisaria, P. Bradley, J.S. Breunig, E. Bush, D. Cappel, E. Capra, W. Chen, J. Clore, P. Combs, C. Doucette, O. Demuren, P. Fellowes, S. Freeman, E. Frenkel, D. Gadala-Maria, R. Gawande, D. Glass, S. Grossberg, A. Gupta, L. Hammonds-Odie, A. Hoisos, Jenny Hsi, Y. Huang Hsu, S. Inukai, K. J. Karczewski, X. Ke, M. Kojima, S. Leachman, D. Lieber, A. Liebowitz, J. Liu, Y. Liu, T. Martin, J. Rosa Mendoza, C. Myhrvold, C. Millian, S. Pfau, S. Raj, M. Rich, J. Rokicki, W. Rounds, M. Salazar, M. Salesi, R. Sharma, S. Silverman, C. Singer, S. Sinha, M. Staller, P. Stern, H. Tang, S. Weeks, M. Weidmann, A. Wolf, C. Young, J. Yuan, C. Crutchfield, M. McClean, C. Murphy, M. Llinás, D. Botstein, O. Troyanskaya, M. Dunham

Whole-genome sequencing, particularly in fungi, has progressed at a tremendous rate. More difficult, however, is experimental testing of the inferences about gene function that can be drawn from comparative sequence analysis alone. We present a genome-wide functional characterization of a sequenced but experimentally understudied budding yeast, Saccharomyces bayanus var. uvarum (henceforth referred to as S. bayanus), allowing us to map changes over the 20 million years that separate this organism from S. cerevisiae. We first created a suite of genetic tools to facilitate work in S. bayanus. Next, we measured the gene-expression response of S. bayanus to a diverse set of perturbations optimized using a computational approach to cover a diverse array of functionally relevant biological responses. The resulting data set reveals that gene-expression patterns are largely conserved, but significant changes may exist in regulatory networks such as carbohydrate utilization and meiosis. In addition to regulatory changes, our approach identified gene functions that have diverged. The functions of genes in core pathways are highly conserved, but we observed many changes in which genes are involved in osmotic stress, peroxisome biogenesis, and autophagy. A surprising number of genes specific to S. bayanus respond to oxidative stress, suggesting the organism may have evolved under different selection pressures than S. cerevisiae. This work expands the scope of genome-scale evolutionary studies from sequence-based analysis to rapid experimental characterization and could be adopted for functional mapping in any lineage of interest. Furthermore, our detailed characterization of S. bayanus provides a valuable resource for comparative functional genomics studies in yeast.

Show Abstract

Functional Knowledge Transfer for High-Accuracy Prediction of under-Studied Biological Processes

C. Park, A. Wong , C. Greene, J. Rowland, Y. Guan, L. Bongo, R. Burdine, O. Troyanskaya

A key challenge in genetics is identifying the functional roles of genes in pathways. Numerous functional genomics techniques (e.g. machine learning) that predict protein function have been developed to address this question. These methods generally build from existing annotations of genes to pathways and thus are often unable to identify additional genes participating in processes that are not already well studied. Many of these processes are well studied in some organism, but not necessarily in an investigator's organism of interest. Sequence-based search methods (e.g. BLAST) have been used to transfer such annotation information between organisms. We demonstrate that functional genomics can complement traditional sequence similarity to improve the transfer of gene annotations between organisms. Our method transfers annotations only when functionally appropriate as determined by genomic data and can be used with any prediction algorithm to combine transferred gene function knowledge with organism-specific high-throughput data to enable accurate function prediction.

We show that diverse state-of-art machine learning algorithms leveraging functional knowledge transfer (FKT) dramatically improve their accuracy in predicting gene-pathway membership, particularly for processes with little experimental knowledge in an organism. We also show that our method compares favorably to annotation transfer by sequence similarity. Next, we deploy FKT with state-of-the-art SVM classifier to predict novel genes to 11,000 biological processes across six diverse organisms and expand the coverage of accurate function predictions to processes that are often ignored because of a dearth of annotated genes in an organism. Finally, we perform in vivo experimental investigation in Danio rerio and confirm the regulatory role of our top predicted novel gene, wnt5b, in leftward cell migration during heart development. FKT is immediately applicable to many bioinformatics techniques and will help biologists systematically integrate prior knowledge from diverse systems to direct targeted experiments in their organism of study.

Show Abstract

Chapter 2: Data-Driven View of Disease Biology

C. Greene, O. Troyanskaya

Modern experimental strategies often generate genome-scale measurements of human tissues or cell lines in various physiological states. Investigators often use these datasets individually to help elucidate molecular mechanisms of human diseases. Here we discuss approaches that effectively weight and integrate hundreds of heterogeneous datasets to gene-gene networks that focus on a specific process or disease. Diverse and systematic genome-scale measurements provide such approaches both a great deal of power and a number of challenges. We discuss some such challenges as well as methods to address them. We also raise important considerations for the assessment and evaluation of such approaches. When carefully applied, these integrative data-driven methods can make novel high-quality predictions that can transform our understanding of the molecular-basis of human disease.

Show Abstract

Involvement of Histone Demethylase LSD1 in Short-Time-Scale Gene Expression Changes during Cell Cycle Progression in Embryonic Stem Cells

V. Nair, Y. Ge , N. Balasubramaniyan , J. Kim , Y. Okawa , M. Chikina , O. Troyanskaya, S. Sealfon

The histone demethylase LSD1, a component of the CoREST (corepressor for element 1-silencing transcription factor) corepressor complex, plays an important role in the downregulation of gene expression during development. However, the activities of LSD1 in mediating short-time-scale gene expression changes have not been well understood. To reveal the mechanisms underlying these two distinct functions of LSD1, we performed genome-wide mapping and cellular localization studies of LSD1 and its dimethylated histone 3 lysine 4 (substrate H3K4me2) in mouse embryonic stem cells (ES cells). Our results showed an extensive overlap between the LSD1 and H3K4me2 genomic regions and a correlation between the genomic levels of LSD1/H3K4me2 and gene expression, including many highly expressed ES cell genes. LSD1 is recruited to the chromatin of cells in the G(1)/S/G(2) phases and is displaced from the chromatin of M-phase cells, suggesting that LSD1 or H3K4me2 alternatively occupies LSD1 genomic regions during cell cycle progression. LSD1 knockdown by RNA interference or its displacement from the chromatin by antineoplastic agents caused an increase in the levels of a subset of LSD1 target genes. Taken together, these results suggest that cell cycle-dependent association and dissociation of LSD1 with chromatin mediates short-time-scale gene expression changes during embryonic stem cell cycle progression.

Show Abstract

Tissue-Specific Functional Networks for Prioritizing Phenotype and Disease Genes

Y. Guam, D. Gorenshteyn, M. Burmeister, A. Wong, J. Schimenti, M. Handel, C. Bult, M. Hibbs, O. Troyanskaya

Integrated analyses of functional genomics data have enormous potential for identifying phenotype-associated genes. Tissue-specificity is an important aspect of many genetic diseases, reflecting the potentially different roles of proteins and pathways in diverse cell lineages. Accounting for tissue specificity in global integration of functional genomics data is challenging, as “functionality” and “functional relationships” are often not resolved for specific tissue types. We address this challenge by generating tissue-specific functional networks, which can effectively represent the diversity of protein function for more accurate identification of phenotype-associated genes in the laboratory mouse. Specifically, we created 107 tissue-specific functional relationship networks through integration of genomic data utilizing knowledge of tissue-specific gene expression patterns. Cross-network comparison revealed significantly changed genes enriched for functions related to specific tissue development. We then utilized these tissue-specific networks to predict genes associated with different phenotypes. Our results demonstrate that prediction performance is significantly improved through using the tissue-specific networks as compared to the global functional network. We used a testis-specific functional relationship network to predict genes associated with male fertility and spermatogenesis phenotypes, and experimentally confirmed one top prediction, Mbyl1. We then focused on a less-common genetic disease, ataxia, and identified candidates uniquely predicted by the cerebellum network, which are supported by both literature and experimental evidence. Our systems-level, tissue-specific scheme advances over traditional global integration and analyses and establishes a prototype to address the tissue-specific effects of genetic perturbations, diseases and drugs.

Show Abstract
September 27, 2012

Accurate Evaluation and Analysis of Functional Genomics Data and Methods

C. Greene, O. Troyanskaya

The development of technology capable of inexpensively performing large-scale measurements of biological systems has generated a wealth of data. Integrative analysis of these data holds the promise of uncovering gene function, regulation, and, in the longer run, understanding complex disease. However, their analysis has proved very challenging, as it is difficult to quickly and effectively assess the relevance and accuracy of these data for individual biological questions. Here, we identify biases that present challenges for the assessment of functional genomics data and methods. We then discuss evaluation methods that, taken together, begin to address these issues. We also argue that the funding of systematic data-driven experiments and of high-quality curation efforts will further improve evaluation metrics so that they more-accurately assess functional genomics data and methods. Such metrics will allow researchers in the field of functional genomics to continue to answer important biological questions in a data-driven manner.

Show Abstract

IMP: A Multi-Species Functional Genomics Portal for Integration, Visualization and Prediction of Protein Functions and Networks

A. Wong, C. Park, C. Greene, L. Bongo, Y. Guan, O. Troyanskaya

Integrative multi-species prediction (IMP) is an interactive web server that enables molecular biologists to interpret experimental results and to generate hypotheses in the context of a large cross-organism compendium of functional predictions and networks. The system provides a framework for biologists to analyze their candidate gene sets in the context of functional networks, as they expand or focus these sets by mining functional relationships predicted from integrated high-throughput data. IMP integrates prior knowledge and data collections from multiple organisms in its analyses. Through flexible and interactive visualizations, researchers can compare functional contexts and interpret the behavior of their gene sets across organisms. Additionally, IMP identifies homologs with conserved functional roles for knowledge transfer, allowing for accurate function predictions even for biological processes that have very few experimental annotations in a given organism. IMP currently supports seven organisms (Homo sapiens, Mus musculus, Rattus novegicus, Drosophila melanogaster, Danio rerio, Caenorhabditis elegans and Saccharomyces cerevisiae), does not require any registration or installation and is freely available for use at http://imp.princeton.edu.

Show Abstract

An Effective Statistical Evaluation of ChIPseq Dataset Similarity

M. Chikina, O. Troyanskaya

MOTIVATION:
ChIPseq is rapidly becoming a common technique for investigating protein-DNA interactions. However, results from individual experiments provide a limited understanding of chromatin structure, as various chromatin factors cooperate in complex ways to orchestrate transcription. In order to quantify chromtain interactions, it is thus necessary to devise a robust similarity metric applicable to ChIPseq data. Unfortunately, moving past simple overlap calculations to give statistically rigorous comparisons of ChIPseq datasets often involves arbitrary choices of distance metrics, with significance being estimated by computationally intensive permutation tests whose statistical power may be sensitive to non-biological experimental and post-processing variation.
RESULTS:
We show that it is in fact possible to compare ChIPseq datasets through the efficient computation of exact P-values for proximity. Our method is insensitive to non-biological variation in datasets such as peak width, and can rigorously model peak location biases by evaluating similarity conditioned on a restricted set of genomic regions (such as mappable genome or promoter regions). Applying our method to the well-studied dataset of Chen et al. (2008), we elucidate novel interactions which conform well with our biological understanding. By comparing ChIPseq data in an asymmetric way, we are able to observe clear interaction differences between cofactors such as p300 and factors that bind DNA directly.
AVAILABILITY:
Source code is available for download at http://sonorus.princeton.edu/IntervalStats/IntervalStats.tar.gz.

Show Abstract

Integrated Molecular Profiles of Invasive Breast Tumors and Ductal Carcinoma in Situ (DCIS) Reveal Differential Vascular and Interleukin Signaling

V. Kristensen, C. Vaske , C. Ursini-Siegel , P. Van Loo , S. Nordgard , R. Sachidanandam , T. Sørlie , F. Wärnberg , V. Haakensen , Å. Helland , B. Naume, C. Perou , D. Haussler , O. Troyanskaya, A. Børresen-Dale

We use an integrated approach to understand breast cancer heterogeneity by modeling mRNA, copy number alterations, microRNAs, and methylation in a pathway context utilizing the pathway recognition algorithm using data integration on genomic models (PARADIGM). We demonstrate that combining mRNA expression and DNA copy number classified the patients in groups that provide the best predictive value with respect to prognosis and identified key molecular and stromal signatures. A chronic inflammatory signature, which promotes the development and/or progression of various epithelial tumors, is uniformly present in all breast cancers. We further demonstrate that within the adaptive immune lineage, the strongest predictor of good outcome is the acquisition of a gene signature that favors a high T-helper 1 (Th1)/cytotoxic T-lymphocyte response at the expense of Th2-driven humoral immunity. Patients who have breast cancer with a basal HER2-negative molecular profile (PDGM2) are characterized by high expression of protumorigenic Th2/humoral-related genes (24-38%) and a low Th1/Th2 ratio. The luminal molecular subtypes are again differentiated by low or high FOXM1 and ERBB4 signaling. We show that the interleukin signaling profiles observed in invasive cancers are absent or weakly expressed in healthy tissue but already prominent in ductal carcinoma in situ, together with ECM and cell-cell adhesion regulating pathways. The most prominent difference between low and high mammographic density in healthy breast tissue by PARADIGM was that of STAT4 signaling. In conclusion, by means of a pathway-based modeling methodology (PARADIGM) integrating different layers of molecular data from whole-tumor samples, we demonstrate that we can stratify immune signatures that predict patient survival.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates