162 Publications

An automated framework for efficiently designing deep convolutional neural networks in genomics

Convolutional neural networks (CNNs) have become a standard for analysis of biological sequences. Tuning of network architectures is essential for CNN’s performance, yet it requires substantial knowledge of machine learning and commitment of time and effort. This process thus imposes a major barrier to broad and effective application of modern deep learning in genomics. Here, we present AMBER, a fully automated framework to efficiently design and apply CNNs for genomic sequences. AMBER designs optimal models for user-specified biological questions through the state-of-the-art Neural Architecture Search (NAS). We applied AMBER to the task of modelling genomic regulatory features and demonstrated that the predictions of the AMBER-designed model are significantly more accurate than the equivalent baseline non-NAS models and match or even exceed published expert-designed models. Interpretation of AMBER architecture search revealed its design principles of utilizing the full space of computational operations for accurately modelling genomic sequences. Furthermore, we illustrated the use of AMBER to accurately discover functional genomic variants in allele-specific binding and disease heritability enrichment. AMBER provides an efficient automated method for designing accurate deep learning models in genomics.

Show Abstract
August 19, 2020

Presenilin 1 phosphorylation regulates amyloid-β degradation by microglia

J Ledo, T Liebman, R Zhang, C Chang, E Azevedo, E Wong, H Silva, O. Troyanskaya, V Bustos, P Greengard

Amyloid-β peptide (Aβ) accumulation in the brain is a hallmark of Alzheimer’s Disease. An important mechanism of Aβ clearance in the brain is uptake and degradation by microglia. Presenilin 1 (PS1) is the catalytic subunit of γ-secretase, an enzyme complex responsible for the maturation of multiple substrates, such as Aβ. Although PS1 has been extensively studied in neurons, the role of PS1 in microglia is incompletely understood. Here we report that microglia containing phospho-deficient mutant PS1 display a slower kinetic response to micro injury in the brain in vivo and the inability to degrade Aβ oligomers due to a phagolysosome dysfunction. An Alzheimer’s mouse model containing phospho-deficient PS1 show severe Aβ accumulation in microglia as well as the postsynaptic protein PSD95. Our results demonstrate a novel mechanism by which PS1 modulates microglial function and contributes to Alzheimer’s -associated phenotypes.

Show Abstract
August 13, 2020

Abstract 2504: Modeling molecular development of breast cancer in canine mammary tumors

K. Graim, D Gorenshteyn, D Robinson, N. Carriero, J Cahill, R Chakrabarti, M Goldschmidt, A Durham, J. Funk, J Storey, V Kristensen, C Theesfeld, K Sorenmo, O. Troyanskaya

Malignancy in cancer is a consequence of the progressive accumulation of mutations in a tumor, with profound implications for drug selection and treatment. However, in human studies, inter-patient variability obscures molecular signatures of tumor progression because patients usually present with a single mammary tumor. In contrast, dogs frequently exhibit multiple naturally occurring mammary tumors in the same individual. Moreover, canine mammary tumors (CMTs) and human breast cancer have similar histopathological profiles and clinical presentation. We leverage the CMT model to elucidate genome-wide molecular changes clinically relevant in human breast cancer, focusing on signals underlying tumor development. We develop a robust, generally applicable, computational analysis framework (FREYA) for analysis of CMTs for comparative oncology. Using FREYA, we RNA profile 89 samples from 16 dogs, and demonstrate that CMTs recapitulate human breast cancer subtypes. We then extract molecular profiles of breast cancer progression at three distinct stages (normal, pre-malignant and malignant) and identify signatures of gene expression reflective of tumor progression. Focusing on the transitions to malignancy, we identify transcriptional patterns and biological pathways specific to malignant tumors and distinct from those characterizing pre-malignant tumors or normal tissue. We find that human breast cancer patients whose tumors exhibit strong CMT malignancy signatures have significantly decreased survival, indicative of the importance of the tumor progression processes identified in CMTs to human breast cancer prognosis. Altogether, our comprehensive genomic characterization demonstrates that CMTs are a powerful translational model of breast cancer, providing insights that inform our understanding of tumor development in humans. To catalyze and support similar analyses and use of the CMT model by other biomedical researchers, we publicly share all of our data and provide FREYA, a robust data processing pipeline and statistical analyses framework, at freya.flatironinstitute.org.

Show Abstract

Genomic analyses implicate noncoding de novo variants in congenital heart disease

F Richter, S Morton, S Kim, A Kitaygorodsky, L Wasson, K. Chen

A genetic etiology is identified for one-third of patients with congenital heart disease (CHD), with 8% of cases attributable to coding de novo variants (DNVs). To assess the contribution of noncoding DNVs to CHD, we compared genome sequences from 749 CHD probands and their parents with those from 1,611 unaffected trios. Neural network prediction of noncoding DNV transcriptional impact identified a burden of DNVs in individuals with CHD (n = 2,238 DNVs) compared to controls (n = 4,177; P = 8.7 × 10−4). Independent analyses of enhancers showed an excess of DNVs in associated genes (27 genes versus 3.7 expected, P = 1 × 10−5). We observed significant overlap between these transcription-based approaches (odds ratio (OR) = 2.5, 95% confidence interval (CI) 1.1–5.0, P = 5.4 × 10−3). CHD DNVs altered transcription levels in 5 of 31 enhancers assayed. Finally, we observed a DNV burden in RNA-binding-protein regulatory sites (OR = 1.13, 95% CI 1.1–1.2, P = 8.8 × 10−5). Our findings demonstrate an enrichment of potentially disruptive regulatory noncoding DNVs in a fraction of CHD at least as high as that observed for damaging coding DNVs.

Show Abstract

Specific viral RNA drives the SARS CoV-2 nucleocapsid to phase separate

C. Iserman, C. Roden, M. Boerneke, R. Sealfon, G. McLaughlin, I. Jungreis, C. Park, A. Boppana, E. Fritch, Y. Hou, C. Theesfeld, O. Troyanskaya, R. Baric, T. Sheahan, K. Weeks, A. Gladfelter

A mechanistic understanding of the SARS-CoV-2 viral replication cycle is essential to develop new therapies for the COVID-19 global health crisis. In this study, we show that the SARS-CoV-2 nucleocapsid protein (N-protein) undergoes liquid-liquid phase separation (LLPS) with the viral genome, and propose a model of viral packaging through LLPS. N-protein condenses with specific RNA sequences in the first 1000 nts (5’-End) under physiological conditions and is enhanced at human upper airway temperatures. N-protein condensates exclude non-packaged RNA sequences. We comprehensively map sites bound by N-protein in the 5’-End and find preferences for single-stranded RNA flanked by stable structured elements. Liquid-like N-protein condensates form in mammalian cells in a concentration-dependent manner and can be altered by small molecules. Condensation of N-protein is sequence and structure specific, sensitive to human body temperature, and manipulatable with small molecules thus presenting screenable processes for identifying antiviral compounds effective against SARS-CoV-2.

Show Abstract

Genome-wide landscape of RNA-binding protein dysregulation reveals a major impact on psychiatric disorder risk

C. Park, J Zhou, A. Wong, K. Chen, C Theesfeld, R Darnell, O. Troyanskaya

Despite the strong genetic basis of psychiatric disorders, the molecular origins of these diseases are still largely unmapped. RNA-binding proteins (RBPs) are responsible for most post-transcriptional regulation, from splicing to translational to localization. RBPs thus act as key gatekeepers of cellular homeostasis, especially in the brain. Here, we leverage a deep learning approach to interrogate variant effects genome-wide, and discover that the dysregulation of RBP target sites is a principal contributor to psychiatric disorder risk. We show that specific modes of RBP regulation are genetically linked to the heritability of psychiatric disorders, and demonstrate that diverse RBP regulatory functions are reflected in distinct genome-wide negative selection signatures. Notably, RBP dysregulation has a stronger impact on psychiatric disorders than common coding region variants and explains heritability not currently captured by large-scale molecular QTL studies (expression QTLs and splicing QTLs). We share genome-wide profiles of RBP target site dysregulation, which we used to identify DDHD2 as a candidate schizophrenia risk gene, in a public web server. This resource provides a novel analytical framework to connect the full range of RNA regulation to complex disease.

Show Abstract

DeepArk: modeling cis-regulatory codes of model species with deep learning

E Cofer, J Raimundo, A Tadych, Y Yamazaki, A. Wong, C Theesfeld, M Levine, O. Troyanskaya

To enable large-scale analyses of regulatory logic in model species, we developed DeepArk (https://DeepArk.princeton.edu), a set of deep learning models of the cis-regulatory codes of four widelystudied species: Caenorhabditis elegans, Danio rerio, Drosophila melanogaster, and Mus musculus. DeepArk accurately predicts the presence of thousands of different context-specific regulatory features, including chromatin states, histone marks, and transcription factors. In vivo studies show that DeepArk can predict the regulatory impact of any genomic variant (including rare or not previously observed), and enables the regulatory annotation of understudied model species.

Show Abstract
April 28, 2020

Spatial Transcriptional Mapping of the Human Nephrogenic Program

N Lindström, R. Sealfon, X. Chen, R Parvez, A Ransick, G De Sena Brandine, J Guo, B Hill, T Tran, A Kim, J Zhou, A Tadych, A. Watters, A. Wong, E. Lovero, B Grubbs, M Thornton, J McMahon, A Smith, S Ruffins , C Armit, O. Troyanskaya, A McMahon

defects affecting 3% of newborns. The human kidney develops over a 30-week period in which a nephron progenitor pool gives rise to around a million nephrons. To establish a framework for human nephrogenesis, we spatially resolved a stereotypical process by which equipotent nephron progenitors generate a nephron anlagen, then applied data-driven approaches to construct three-dimensional protein maps on anatomical models of the nephrogenic program. Single cell RNA sequencing identified novel progenitor states which were spatially mapped to the nephron anatomy enabling the generation of functional gene-networks predicting interactions within and between nephron cell-types. Network mining identified known developmental disease genes and predicts new targets of interest. The spatially resolved nephrogenic program made available through the Human Nephrogenesis Atlas (https://sckidney.flatironinstitute.org/) will facilitate an understanding of kidney development and disease, and enhance efforts to generate new kidney structures.

Show Abstract
April 28, 2020

Machine learning, the kidney, and genotype–phenotype analysis

R. Sealfon, L Mariani, M Kretzler, O. Troyanskaya

With biomedical research transitioning into data-rich science, machine learning provides a powerful toolkit for extracting knowledge from large-scale biological data sets. The increasing availability of comprehensive kidney omics compendia (transcriptomics, proteomics, metabolomics, and genome sequencing), as well as other data modalities such as electronic health records, digital nephropathology repositories, and radiology renal images, makes machine learning approaches increasingly essential for analyzing human kidney data sets. Here, we discuss how machine learning approaches can be applied to the study of kidney disease, with a particular focus on how they can be used for understanding the relationship between genotype and phenotype.

Show Abstract

Subtype-specific transcriptional regulators in breast tumors subjected to genetic and epigenetic alterations

Q Zhu, X Tekpli, O. Troyanskaya, V Kristensen

Motivation
Breast cancer consists of multiple distinct tumor subtypes, and results from epigenetic and genetic aberrations that give rise to distinct transcriptional profiles. Despite previous efforts to understand transcriptional deregulation through transcription factor networks, the transcriptional mechanisms leading to subtypes of the disease remain poorly understood.

Results
We used a sophisticated computational search of thousands of expression datasets to define extended signatures of distinct breast cancer subtypes. Using ENCODE ChIP-seq data of surrogate cell lines and motif analysis we observed that these subtypes are determined by a distinct repertoire of lineage-specific transcription factors. Furthermore, specific pattern and abundance of copy number and DNA methylation changes at these TFs and targets, compared to other genes and to normal cells were observed. Overall, distinct transcriptional profiles are linked to genetic and epigenetic alterations at lineage-specific transcriptional regulators in breast cancer subtypes.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates