162 Publications

Interpretation of an individual functional genomics experiment guided by massive public data

Y. Lee, A. Wong, A. Tadych, B. Hartmann, C. Park, V. DeJesus, I. Ramos, E. Zaslavsky, S. Sealfon, O. Troyanskaya

A key unmet challenge in interpreting omics experiments is inferring biological meaning in the context of public functional genomics data. We developed a computational framework, Your Evidence Tailored Integration (YETI; http://yeti.princeton.edu/ ), which creates specialized functional interaction maps from large public datasets relevant to an individual omics experiment. Using this tailored integration, we predicted and experimentally confirmed an unexpected divergence in viral replication after seasonal or pandemic human influenza virus infection.

Show Abstract

An integrative tissue-network approach to identify and test human disease genes.

V. Yao, R. Kaletsky, W. Keyes, D. Mor, A. Wong, S. Sohrabi, C. Murphy, O. Troyanskaya

Effective discovery of causal disease genes must overcome the statistical challenges of quantitative genetics studies and the practical limitations of human biology experiments. Here we developed diseaseQUEST, an integrative approach that combines data from human genome-wide disease studies with in silico network models of tissue- and cell-type-specific function in model organisms to prioritize candidates within functionally conserved processes and pathways. We used diseaseQUEST to predict candidate genes for 25 different diseases and traits, including cancer, longevity, and neurodegenerative diseases. Focusing on Parkinson's disease (PD), a diseaseQUEST-directed Caenhorhabditis elegans behavioral screen identified several candidate genes, which we experimentally verified and found to be associated with age-dependent motility defects mirroring PD clinical symptoms. Furthermore, knockdown of the top candidate gene, bcat-1, encoding a branched chain amino acid transferase, caused spasm-like 'curling' and neurodegeneration in C. elegans, paralleling decreased BCAT1 expression in PD patient brains. diseaseQUEST is modular and generalizable to other model organisms and human diseases of interest.

Show Abstract
October 22, 2018

Enabling Precision Medicine through Integrative Network Models.

A key challenge in precision medicine lies in understanding molecular-level underpinnings of complex human disease. Biological networks in multicellular organisms can generate hypotheses about disease genes, pathways, and their behavior in disease-related tissues. Diverse functional genomic data, including expression, protein-protein interaction, and relevant sequence and literature information, can be utilized to build integrative networks that provide both genome-wide coverage as well as contextual specificity and accuracy. By carefully extracting the relevant signal in thousands of heterogeneous functional genomics experiments through integrative analysis, these networks model how genes work together in specific contexts to carry out cellular processes, thereby contributing to a molecular-level understanding of complex human disease and paving the way toward better therapy and drug treatment. Here, we discuss current methods to build context-specific integrative networks, focusing on tissue-specific networks. We highlight applications of these networks in predicting tissue-specific molecular response, identifying candidate disease genes, and increasing power by amplifying the disease signal in quantitative genetics data. Altogether, these exciting developments enable biomedical scientists to characterize disease from pathophysiology to cellular system and, finally, to specific gene alterations-making significant strides toward the goal of precision medicine.

Show Abstract

Single-cell analysis of progenitor cell dynamics and lineage specification in the human fetal kidney.

R. Menon, E. Otto, A. Kokoruda, J. Zhou, Z. Zhang, E. Yoon, Y. Chen, O. Troyanskaya, J. Spence, M. Kretzler, C. Cebrián

The mammalian kidney develops through reciprocal interactions between the ureteric bud and the metanephric mesenchyme to give rise to the entire collecting system and the nephrons. Most of our knowledge of the developmental regulators driving this process arises from the study of gene expression and functional genetics in mice and other animal models. In order to shed light on human kidney development, we have used single-cell transcriptomics to characterize gene expression in different cell populations, and to study individual cell dynamics and lineage trajectories during development. Single-cell transcriptome analyses of 6414 cells from five individual specimens identified 11 initial clusters of specific renal cell types as defined by their gene expression profile. Further subclustering identifies progenitors, and mature and intermediate stages of differentiation for several renal lineages. Other lineages identified include mesangium, stroma, endothelial and immune cells. Novel markers for these cell types were revealed in the analysis, as were components of key signaling pathways driving renal development in animal models. Altogether, we provide a comprehensive and dynamic gene expression profile of the developing human kidney at the single-cell level.

Show Abstract
August 30, 2018

Transcriptome analysis of adult Caenorhabditis elegans cells reveals tissue-specific gene and isoform expression.

R. Kaletsky, V. Yao, A. Williams, A. Runnels, A. Tadych, S. Zhou, O. Troyanskaya, C. Murphy

The biology and behavior of adults differ substantially from those of developing animals, and cell-specific information is critical for deciphering the biology of multicellular animals. Thus, adult tissue-specific transcriptomic data are critical for understanding molecular mechanisms that control their phenotypes. We used adult cell-specific isolation to identify the transcriptomes of C. elegans' four major tissues (or "tissue-ome"), identifying ubiquitously expressed and tissue-specific "enriched" genes. These data newly reveal the hypodermis' metabolic character, suggest potential worm-human tissue orthologies, and identify tissue-specific changes in the Insulin/IGF-1 signaling pathway. Tissue-specific alternative splicing analysis identified a large set of collagen isoforms. Finally, we developed a machine learning-based prediction tool for 76 sub-tissue cell types, which we used to predict cellular expression differences in IIS/FOXO signaling, stage-specific TGF-β activity, and basal vs. memory-induced CREB transcription. Together, these data provide a rich resource for understanding the biology governing multicellular adult animals.

Show Abstract
August 10, 2018

Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk

J. Zhou, Chandra L. Theesfeld, K. Yao, K. Chen, A. Wong, O. Troyanskaya

Key challenges for human genetics, precision medicine and evolutionary biology include deciphering the regulatory code of gene expression and understanding the transcriptional effects of genome variation. However, this is extremely difficult because of the enormous scale of the noncoding mutation space. We developed a deep learning–based framework, ExPecto, that can accurately predict, ab initio from a DNA sequence, the tissue-specific transcriptional effects of mutations, including those that are rare or that have not been observed. We prioritized causal variants within disease- or trait-associated loci from all publicly available genome-wide association studies and experimentally validated predictions for four immune-related diseases. By exploiting the scalability of ExPecto, we characterized the regulatory mutation space for human RNA polymerase II–transcribed genes by in silico saturation mutagenesis and profiled > 140 million promoter-proximal mutations. This enables probing of evolutionary constraints on gene expression and ab initio prediction of mutation disease effects, making ExPecto an end-to-end computational framework for the in silico prediction of expression and disease risk.

Show Abstract
July 16, 2018

GIANT 2.0: genome-scale integrated analysis of gene networks in tissues

A. Wong, Arjun Krishnan, O. Troyanskaya

GIANT2 (Genome-wide Integrated Analysis of gene Networks in Tissues) is an interactive web server that enables biomedical researchers to analyze their proteins and pathways of interest and generate hypotheses in the context of genome-scale functional maps of human tissues. The precise actions of genes are frequently dependent on their tissue context, yet direct assay of tissue-specific protein function and interactions remains infeasible in many normal human tissues and cell-types. With GIANT2, researchers can explore predicted tissue-specific functional roles of genes and reveal changes in those roles across tissues, all through interactive multi-network visualizations and analyses. Additionally, the NetWAS approach available through the server uses tissue-specific/cell-type networks predicted by GIANT2 to re-prioritize statistical associations from GWAS studies and identify disease-associated genes. GIANT2 predicts tissue-specific interactions by integrating diverse functional genomics data from now over 61 400 experiments for 283 diverse tissues and cell-types. GIANT2 does not require any registration or installation and is freely available for use at http://giant-v2.princeton.edu.

Show Abstract

Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder

A Krishnan, R Zhang, V Yao, C Theesfeld, A. Wong, A Tadych, N. Volfovsky, Alan Packer, Ph.D., O. Troyanskaya

Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder with a strong genetic basis. Yet, only a small fraction of potentially causal genes-about 65 genes out of an estimated several hundred-are known with strong genetic evidence from sequencing studies. We developed a complementary machine-learning approach based on a human brain-specific gene network to present a genome-wide prediction of autism risk genes, including hundreds of candidates for which there is minimal or no prior genetic evidence. Our approach was validated in a large independent case-control sequencing study. Leveraging these genome-wide predictions and the brain-specific network, we demonstrated that the large set of ASD genes converges on a smaller number of key pathways and developmental stages of the brain. Finally, we identified likely pathogenic genes within frequent autism-associated copy-number variants and proposed genes and pathways that are likely mediators of ASD across multiple copy-number variants. All predictions and functional insights are available at http://asd.princeton.edu.

Show Abstract

Bioinformatics Approaches to Profile the Tumor Microenvironment for Immunotherapeutic Discovery

T Clancy, R Dannenfelser, O. Troyanskaya, K Malmberg, E Hovig, V Kristensen

In the microenvironment of a malignancy, tumor cells do not exist in isolation, but rather in a diverse ecosystem consisting not only of heterogeneous tumor-cell clones, but also normal cell types such as fibroblasts, vasculature, and an extensive pool of immune cells at numerous possible stages of activation and differentiation. This results in a complex interplay of diverse cellular signaling systems, where the immune cell component is now established to influence cancer progression and therapeutic response. It is experimentally difficult and laborious to comprehensively and systematically profile these distinct cell types from heterogeneous tumor samples in order to capitalize on potential therapeutic and biomarker discoveries. One emerging solution to address this challenge is to computationally extract cell-type specific information directly from bulk tumors. Such in silico approaches are advantageous because they can capture both the cell-type specific profiles and the tissue systems level of cell-cell interactions. Accurately and comprehensively predicting these patterns in tumors is an important challenge to overcome, not least given the success of immunotherapeutic drug treatment of several human cancers. This is especially challenging for subsets of closely related immune cell phenotypes with relatively small gene expression differences, which have critical functional distinctions. Here, we outline the existing and emerging novel bioinformatics strategies that can be used to profile the tumor immune landscape.

Show Abstract

Data-driven Analysis of Immune Infiltrate In a Large Cohort of Breast Cancer and Its Association With Disease Progression

R Dannenfelser, M Nome, A Tahiri, J Ursini-Siegel, H Vollan, V Haakensen, A Helland, B Naume, C Caldas, A Borresen-Dale, V Kristensen, O. Troyanskaya

The tumor microenvironment is now widely recognized for its role in tumor progression, treatment response, and clinical outcome. The intratumoral immunological landscape, in particular, has been shown to exert both pro-tumorigenic and anti-tumorigenic effects. Identifying immunologically active or silent tumors may be an important indication for administration of therapy, and detecting early infiltration patterns may uncover factors that contribute to early risk. Thus far, direct detailed studies of the cell composition of tumor infiltration have been limited; with some studies giving approximate quantifications using immunohistochemistry and other small studies obtaining detailed measurements by isolating cells from excised tumors and sorting them using flow cytometry. Herein we utilize a machine learning based approach to identify lymphocyte markers with which we can quantify the presence of B cells, cytotoxic T-lymphocytes, T-helper 1, and T-helper 2 cells in any gene expression data set and apply it to studies of breast tissue. By leveraging over 2,100 samples from existing large scale studies, we are able to find an inherent cell heterogeneity in clinically characterized immune infiltrates, a strong link between estrogen receptor activity and infiltration in normal and tumor tissues, changes with genomic complexity, and identify characteristic differences in lymphocyte expression among molecular groupings. With our extendable methodology for capturing cell type specific signal we systematically studied immune infiltration in breast cancer, finding an inverse correlation between beneficial lymphocyte infiltration and estrogen receptor activity in normal breast tissue and reduced infiltration in estrogen receptor negative tumors with high genomic complexity.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates