140 Publications

An automated framework for efficiently designing deep convolutional neural networks in genomics

Convolutional neural networks (CNNs) have become a standard for analysis of biological sequences. Tuning of network architectures is essential for a CNN’s performance, yet it requires substantial knowledge of machine learning and commitment of time and effort. This process thus imposes a major barrier to broad and effective application of modern deep learning in genomics. Here we present Automated Modelling for Biological Evidence-based Research (AMBER), a fully automated framework to efficiently design and apply CNNs for genomic sequences. AMBER designs optimal models for user-specified biological questions through the state-of-the-art neural architecture search (NAS). We applied AMBER to the task of modelling genomic regulatory features and demonstrated that the predictions of the AMBER-designed model are significantly more accurate than the equivalent baseline non-NAS models and match or even exceed published expert-designed models. Interpretation of AMBER architecture search revealed its design principles of utilizing the full space of computational operations for accurately modelling genomic sequences. Furthermore, we illustrated the use of AMBER to accurately discover functional genomic variants in allele-specific binding and disease heritability enrichment. AMBER provides an efficient automated method for designing accurate deep learning models in genomics.

Show Abstract

AMBIENT: Accelerated Convolutional Neural Network Architecture Search for Regulatory Genomics

Convolutional neural networks (CNN) have become a standard approach for modeling genomic sequences. CNNs can be effectively built by Neural Architecture Search (NAS) by trading computing power for accurate neural architectures. Yet, the consumption of immense computing power is a major practical, financial, and environmental issue for deep learning. Here, we present a novel NAS framework,
AMBIENT, that generates highly accurate CNN architectures for biological sequences of diverse functions, while substantially reducing the computing cost of conventional NAS.

Show Abstract
February 27, 2021

mRNA-1273 efficacy in a severe COVID-19 model: attenuated activation of pulmonary immune cells after challenge

M. Meyer, Y. Wang, D. Edwards, G. Smith, A. Rubenstein, P. Ramanathan, C. Mire, C. Pietzch, X. Chen, Y. Ge, W. Cheng, C. Henry, A. Woods, L. Ma, G. Stewart-Jones, K. Bock, M. Minai, B. Nagata, S. Periasamy, P. Shi, B. Graham, I. Moore, I. Ramos, O. Troyanskaya, E. Zaslavsky, A. Carfi, S. Sealfon, A. Bukreyev

The mRNA-1273 vaccine was recently determined to be effective against severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) from interim Phase 3 results. Human studies, however, cannot provide the controlled response to infection and complex immunological insight that are only possible with preclinical studies. Hamsters are the only model that reliably exhibit more severe SARS-CoV-2 disease similar to hospitalized patients, making them pertinent for vaccine evaluation. We demonstrate that prime or prime-boost administration of mRNA-1273 in hamsters elicited robust neutralizing antibodies, ameliorated weight loss, suppressed SARS-CoV-2 replication in the airways, and better protected against disease at the highest prime-boost dose. Unlike in mice and non-human primates, mRNA-1273- mediated immunity was non-sterilizing and coincided with an anamnestic response. Single-cell RNA sequencing of lung tissue permitted high resolution analysis which is not possible in vaccinated humans. mRNA-1273 prevented inflammatory cell infiltration and the reduction of lymphocyte proportions, but enabled antiviral responses conducive to lung homeostasis. Surprisingly, infection triggered transcriptome programs in some types of immune cells from vaccinated hamsters that were shared, albeit attenuated, with mock-vaccinated hamsters. Our results support the use of mRNA-1273 in a two-dose schedule and provides insight into the potential responses within the lungs of vaccinated humans who are exposed to SARS-CoV-2.

Show Abstract
January 25, 2021

Genome-wide landscape of RNA-binding protein target site dysregulation reveals a major impact on psychiatric disorder risk

C. Park, J. Zhou, A. Wong, K. Chen, C. Theesfeld, R. Darnell , O. Troyanskaya

Despite the strong genetic basis of psychiatric disorders, the underlying molecular mechanisms are largely unmapped. RNA-binding proteins (RBPs) are responsible for most post-transcriptional regulation, from splicing to translation to localization. RBPs thus act as key gatekeepers of cellular homeostasis, especially in the brain. However, quantifying the pathogenic contribution of noncoding variants impacting RBP target sites is challenging. Here, we leverage a deep learning approach that can accurately predict the RBP target site dysregulation effects of mutations and discover that RBP dysregulation is a principal contributor to psychiatric disorder risk. RBP dysregulation explains a substantial amount of heritability not captured by large-scale molecular quantitative trait loci studies and has a stronger impact than common coding region variants. We share the genome-wide profiles of RBP dysregulation, which we use to identify DDHD2 as a candidate schizophrenia risk gene. This resource provides a new analytical framework to connect the full range of RNA regulation to complex disease.

Show Abstract
Nature Genetics, 53(2): 166-173
January 18, 2021

Genome-wide landscape of RNA-binding protein target site dysregulation reveals a major impact on psychiatric disorder risk

C. Park, J. Zhou, A. Wong, K. Chen, C. Theesfeld, R. Darnell, O. Troyanskaya

Despite the strong genetic basis of psychiatric disorders, the underlying molecular mechanisms are largely unmapped. RNA-binding proteins (RBPs) are responsible for most post-transcriptional regulation, from splicing to translation to localization. RBPs thus act as key gatekeepers of cellular homeostasis, especially in the brain. However, quantifying the pathogenic contribution of noncoding variants impacting RBP target sites is challenging. Here, we leverage a deep learning approach that can accurately predict the RBP target site dysregulation effects of mutations and discover that RBP dysregulation is a principal contributor to psychiatric disorder risk. RBP dysregulation explains a substantial amount of heritability not captured by large-scale molecular quantitative trait loci studies and has a stronger impact than common coding region variants. We share the genome-wide profiles of RBP dysregulation, which we use to identify DDHD2 as a candidate schizophrenia risk gene. This resource provides a new analytical framework to connect the full range of RNA regulation to complex disease.

Show Abstract

A Multimodal and Integrated Approach to Interrogate Human Kidney Biopsies with Rigor and Reproducibility: Guidelines from the Kidney Precision Medicine Project

T El-Achkar, C. Park, R. Sealfon, O. Troyanskaya, et al.

Comprehensive and spatially mapped molecular atlases of organs at a cellular level are a critical resource to gain insights into pathogenic mechanisms and personalized therapies for diseases. The Kidney Precision Medicine Project (KPMP) is an endeavor to generate 3-dimensional (3D) molecular atlases of healthy and diseased kidney biopsies using multiple state-of-the-art OMICS and imaging technologies across several institutions. Obtaining rigorous and reproducible results from disparate methods and at different sites to interrogate biomolecules at a single cell level or in 3D space is a significant challenge that can be a futile exercise if not well controlled. We describe a "follow the tissue" pipeline for generating a reliable and authentic single cell/region 3D molecular atlas of human adult kidney. Our approach emphasizes quality assurance, quality control, validation and harmonization across different OMICS and imaging technologies from sample procurement, processing, storage, shipping to data generation, analysis and sharing. We established benchmarks for quality control, rigor, reproducibility and feasibility across multiple technologies through a pilot experiment using common source tissue that was processed and analyzed at different institutions and different technologies. A peer review system was established to critically review quality control measures and the reproducibility of data generated by each technology before being approved to interrogate clinical biopsy specimens. The process established economizes the use of valuable biopsy tissue for multi-OMICS and imaging analysis with stringent quality control to ensure rigor and reproducibility of results and serves as a model for precision medicine projects across laboratories, institutions and consortia.

Show Abstract

Identifying intracellular signaling modules and exploring pathways associated with breast cancer recurrence

X. Chen, J. Gu, A. Neuwald, L. Hilakivi-Clarke, R. Clarke, J. Xuan

Exploring complex modularization of intracellular signal transduction pathways is critical to understanding aberrant cellular responses during disease development and drug treatment. IMPALA (Inferred Modularization of PAthway LAndscapes) integrates information from high throughput gene expression experiments and genome-scale knowledge databases to identify aberrant pathway modules, thereby providing a powerful sampling strategy to reconstruct and explore pathway landscapes. Here IMPALA identifies pathway modules associated with breast cancer recurrence and Tamoxifen resistance. Focusing on estrogen-receptor (ER) signaling, IMPALA identifies alternative pathways from gene expression data of Tamoxifen treated ER positive breast cancer patient samples. These pathways were often interconnected through cytoplasmic genes such as IRS1/2, JAK1, YWHAZ, CSNK2A1, MAPK1 and HSP90AA1 and significantly enriched with ErbB, MAPK, and JAK-STAT signaling components. Characterization of the pathway landscape revealed key modules associated with ER signaling and with cell cycle and apoptosis signaling. We validated IMPALA-identified pathway modules using data from four different breast cancer cell lines including sensitive and resistant models to Tamoxifen. Results showed that a majority of genes in cell cycle/apoptosis modules that were up-regulated in breast cancer patients with short survivals (< 5 years) were also over-expressed in drug resistant cell lines, whereas the transcription factors JUN, FOS, and STAT3 were down-regulated in both patient and drug resistant cell lines. Hence, IMPALA identified pathways were associated with Tamoxifen resistance and an increased risk of breast cancer recurrence. The IMPALA package is available at https://dlrl.ece.vt.edu/software/ .

Show Abstract
Scientific Reports , 11(1): 385
January 11, 2021

Identifying intracellular signaling modules and exploring pathways associated with breast cancer recurrence

X. Chen, J. Gu, A. Neuwald, L. Hilakivi-Clarke, R. Clarke, J. Xuan

Exploring complex modularization of intracellular signal transduction pathways is critical to understanding aberrant cellular responses during disease development and drug treatment. IMPALA (Inferred Modularization of PAthway LAndscapes) integrates information from high throughput gene expression experiments and genome-scale knowledge databases to identify aberrant pathway modules, thereby providing a powerful sampling strategy to reconstruct and explore pathway landscapes. Here IMPALA identifies pathway modules associated with breast cancer recurrence and Tamoxifen resistance. Focusing on estrogen-receptor (ER) signaling, IMPALA identifies alternative pathways from gene expression data of Tamoxifen treated ER positive breast cancer patient samples. These pathways were often interconnected through cytoplasmic genes such as IRS1/2, JAK1, YWHAZ, CSNK2A1, MAPK1 and HSP90AA1 and significantly enriched with ErbB, MAPK, and JAK-STAT signaling components. Characterization of the pathway landscape revealed key modules associated with ER signaling and with cell cycle and apoptosis signaling. We validated IMPALA-identified pathway modules using data from four different breast cancer cell lines including sensitive and resistant models to Tamoxifen. Results showed that a majority of genes in cell cycle/apoptosis modules that were up-regulated in breast cancer patients with short survivals (< 5 years) were also over-expressed in drug resistant cell lines, whereas the transcription factors JUN, FOS, and STAT3 were down-regulated in both patient and drug resistant cell lines. Hence, IMPALA identified pathways were associated with Tamoxifen resistance and an increased risk of breast cancer recurrence. The IMPALA package is available at https://dlrl.ece.vt.edu/software/.

Show Abstract

Identifying intracellular signaling modules and exploring pathways associated with breast cancer recurrence

X. Chen, A. Neuwald, L. Hilakivi-Clarke, R. Clarke, J. Xuan

Exploring complex modularization of intracellular signal transduction pathways is critical to understanding aberrant cellular responses during disease development and drug treatment. IMPALA (Inferred Modularization of PAthway LAndscapes) integrates information from high throughput gene expression experiments and genome-scale knowledge databases to identify aberrant pathway modules, thereby providing a powerful sampling strategy to reconstruct and explore pathway landscapes. Here IMPALA identifies pathway modules associated with breast cancer recurrence and Tamoxifen resistance. Focusing on estrogen-receptor (ER) signaling, IMPALA identifies alternative pathways from gene expression data of Tamoxifen treated ER positive breast cancer patient samples. These pathways were often interconnected through cytoplasmic genes such as IRS1/2, JAK1, YWHAZ, CSNK2A1, MAPK1 and HSP90AA1 and significantly enriched with ErbB, MAPK, and JAK-STAT signaling components. Characterization of the pathway landscape revealed key modules associated with ER signaling and with cell cycle and apoptosis signaling. We validated IMPALA-identified pathway modules using data from four different breast cancer cell lines including sensitive and resistant models to Tamoxifen. Results showed that a majority of genes in cell cycle/apoptosis modules that were up-regulated in breast cancer patients with short survivals (< 5 years) were also over-expressed in drug resistant cell lines, whereas the transcription factors JUN, FOS, and STAT3 were down-regulated in both patient and drug resistant cell lines. Hence, IMPALA identified pathways were associated with Tamoxifen resistance and an increased risk of breast cancer recurrence. The IMPALA package is available at https://dlrl.ece.vt.edu/software/.

Show Abstract

Capturing the complexity of topologically associating domains through multi-feature optimization

N. Sauerwald, C. Kingsford

The three-dimensional structure of human chromosomes is tied to gene regulation and replication timing, but there is still a lack of consensus on the computational and biological definitions for chromosomal substructures such as topologically associating domains (TADs). TADs are described and identified by various computational properties leading to different TAD sets with varying compatibility with biological properties such as boundary occupancy of structural proteins. We unify many of these computational and biological targets into one algorithmic framework that jointly maximizes several computational TAD definitions and optimizes TAD selection for a quantifiable biological property. Using this framework, we explore the variability of TAD sets optimized for six different desirable properties of TAD sets: high occupancy of CTCF, RAD21, and H3K36me3 at boundaries, reproducibility between replicates, high intra- vs inter-TAD difference in contact frequencies, and many CTCF binding sites at boundaries. The compatibility of these biological targets varies by cell type, and our results suggest that these properties are better reflected as subpopulations or families of TADs rather than a singular TAD set fitting all TAD definitions and properties. We explore the properties that produce similar TAD sets (reproducibility and inter- vs intra-TAD difference, for example) and those that lead to very different TADs (such as CTCF binding sites and inter- vs intra-TAD contact frequency difference).

Show Abstract
January 5, 2021
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates