Epigenomic profiling has enabled large-scale identification of regulatory elements, yet we still lack a systematic mapping from any sequence or variant to regulatory activities. We address this challenge with Sei, a framework for integrating human genetics data with sequence information to discover the regulatory basis of traits and diseases. Sei learns a vocabulary of regulatory activities, called sequence classes, using a deep learning model that predicts 21,907 chromatin profiles across >1,300 cell lines and tissues. Sequence classes provide a global classification and quantification of sequence and variant effects based on diverse regulatory activities, such as cell type-specific enhancer functions. These predictions are supported by tissue-specific expression, expression quantitative trait loci and evolutionary constraint data. Furthermore, sequence classes enable characterization of the tissue-specific, regulatory architecture of complex traits and generate mechanistic hypotheses for individual regulatory pathogenic mutations. We provide Sei as a resource to elucidate the regulatory basis of human health and disease.
Kidney Precision Medicine Project (KPMP) is building a spatially specified human kidney tissue atlas in health and disease with single-cell resolution. Here, we describe the construction of an integrated reference map of cells, pathways, and genes using unaffected regions of nephrectomy tissues and undiseased human biopsies from 56 adult subjects. We use single-cell/nucleus transcriptomics, subsegmental laser microdissection transcriptomics and proteomics, near-single-cell proteomics, 3D and CODEX imaging, and spatial metabolomics to hierarchically identify genes, pathways, and cells. Integrated data from these different technologies coherently identify cell types/subtypes within different nephron segments and the interstitium. These profiles describe cell-level functional organization of the kidney following its physiological functions and link cell subtypes to genes, proteins, metabolites, and pathways. They further show that messenger RNA levels along the nephron are congruent with the subsegmental physiological activity. This reference atlas provides a framework for the classification of kidney disease when multiple molecular mechanisms underlie convergent clinical phenotypes.
The proto-oncogene DEK regulates neuronal excitability and tau accumulation in Alzheimer’s disease vulnerable neurons
Neurons from layer II of the entorhinal cortex (ECII) are the first to accumulate tau protein aggregates and degenerate during prodromal Alzheimer’s disease. Here, we use a data-driven functional genomics approach to model ECII neurons in silico and identify the proto-oncogene DEK as a potential driver of tau pathology. By modulating DEK levels in EC neurons in vitro and in vivo, we first validate the accuracy and cell-type specificity of our network predictions. We then show that Dek silencing changes the inducibility of immediate early genes and alters neuron excitability, leading to dysregulation of neuronal plasticity genes. We further find that loss of function of DEK leads to tau accumulation in the soma of ECII neurons, reactivity of surrounding microglia, and eventually microglia-mediated neuron loss. This study validates a pathological gene discovery tool that opens new therapeutic avenues and sheds light on a novel pathway driving tau pathology in vulnerable neurons.
Single nucleus transcriptome and chromatin accessibility of postmortem human pituitaries reveal diverse stem cell regulatory mechanisms
Despite their importance in tissue homeostasis and renewal, human pituitary stem cells (PSCs) are incompletely characterized. We describe a human single nucleus RNA-seq and ATAC-seq resource from pediatric, adult, and aged postmortem pituitaries (snpituitaryatlas.princeton.edu) and characterize cell-type-specific gene expression and chromatin accessibility programs for all major pituitary cell lineages. We identify uncommitted PSCs, committing progenitor cells, and sex differences. Pseudotime trajectory analysis indicates that early-life PSCs are distinct from the other age groups. Linear modeling of same-cell multiome data identifies regulatory domain accessibility sites and transcription factors that are significantly associated with gene expression in PSCs compared with other cell types and within PSCs. We identify distinct deterministic mechanisms that contribute to heterogeneous marker expression within PSCs. These findings characterize human stem cell lineages and reveal diverse mechanisms regulating key PSC genes and cell type identity.
Shift in MSL1 alternative polyadenylation in response to DNA damage protects cancer cells from chemotherapeutic agent-induced apoptosis
DNA damage reshapes the cellular transcriptome by modulating RNA transcription and processing. In cancer cells, these changes can alter the expression of genes in the immune surveillance and cell death pathways. Here, we investigate how DNA damage impacts alternative polyadenylation (APA) using the PAPERCLIP technique. We find that APA shifts are a coordinated response for hundreds of genes to DNA damage, and we identify PCF11 as an important contributor of DNA damage-induced APA shifts. One of these APA shifts results in upregulation of the full-length MSL1 mRNA isoform, which protects cells from DNA damage-induced apoptosis and promotes cell survival from DNA-damaging agents. Importantly, blocking MSL1 upregulation enhances cytotoxicity of chemotherapeutic agents even in the absence of p53 and overcomes chemoresistance. Our study demonstrates that characterizing adaptive APA shifts to DNA damage has therapeutic implications and reveals a link between PCF11, the MSL complex, and DNA damage-induced apoptosis.
Attenuated activation of pulmonary immune cells in mRNA-1273–vaccinated hamsters after SARS-CoV-2 infection
The mRNA-1273 vaccine is effective against SARS-CoV-2 and was granted emergency use authorization by the FDA. Clinical studies, however, cannot provide the controlled response to infection and complex immunological insight that are only possible with preclinical studies. Hamsters are the only model that reliably exhibits severe SARS-CoV-2 disease similar to that in hospitalized patients, making them pertinent for vaccine evaluation. We demonstrate that prime or prime-boost administration of mRNA-1273 in hamsters elicited robust neutralizing antibodies, ameliorated weight loss, suppressed SARS-CoV-2 replication in the airways, and better protected against disease at the highest prime-boost dose. Unlike in mice and nonhuman primates, low-level virus replication in mRNA-1273–vaccinated hamsters coincided with an anamnestic response. Single-cell RNA sequencing of lung tissue permitted high-resolution analysis that is not possible in vaccinated humans. mRNA-1273 prevented inflammatory cell infiltration and the reduction of lymphocyte proportions, but enabled antiviral responses conducive to lung homeostasis. Surprisingly, infection triggered transcriptome programs in some types of immune cells from vaccinated hamsters that were shared, albeit attenuated, with mock-vaccinated hamsters. Our results support the use of mRNA-1273 in a 2-dose schedule and provide insight into the potential responses within the lungs of vaccinated humans who are exposed to SARS-CoV-2.
Interpreting the effects of genetic variants is key to understanding individual susceptibility to disease and designing personalized therapeutic approaches. Modern experimental technologies are enabling the generation of massive compendia of human genome sequence data and associated molecular and phenotypic traits, together with genome-scale expression, epigenomics and other functional genomic data. Integrative computational models can leverage these data to understand variant impact, elucidate the effect of dysregulated genes on biological pathways in specific disease and tissue contexts, and interpret disease risk beyond what is feasible with experiments alone. In this Review, we discuss recent developments in machine learning algorithms for genome interpretation and for integrative molecular-level modelling of cells, tissues and organs relevant to disease. More specifically, we highlight existing methods and key challenges and opportunities in identifying specific disease-causing genetic variants and linking them to molecular pathways and, ultimately, to disease phenotypes.
Transcription factors (TFs) often function as a module including both master factors and mediators binding at cis-regulatory regions to modulate nearby gene transcription. ChIP-seq profiling of multiple TFs makes it feasible to infer functional TF modules. However, when inferring TF modules based on co-localization of ChIP-seq peaks, often many weak binding events are missed, especially for mediators, resulting in incomplete identification of modules. To address this problem, we develop a ChIP-seq data-driven Gibbs Sampler to infer Modules (ChIP-GSM) using a Bayesian framework that integrates ChIP-seq profiles of multiple TFs. ChIP-GSM samples read counts of module TFs iteratively to estimate the binding potential of a module to each region and, across all regions, estimates the module abundance. Using inferred module-region probabilistic bindings as feature units, ChIP-GSM then employs logistic regression to predict active regulatory elements. Validation of ChIP-GSM predicted regulatory regions on multiple independent datasets sharing the same context confirms the advantage of using TF modules for predicting regulatory activity. In a case study of K562 cells, we demonstrate that the ChIP-GSM inferred modules form as groups, activate gene expression at different time points, and mediate diverse functional cellular processes. Hence, ChIP-GSM infers biologically meaningful TF modules and improves the prediction accuracy of regulatory region activities.
Experimental approaches to study tissue specificity enable insight into the nature and organization of the cell types and tissues that constitute complex multicellular organisms. Machine learning provides a powerful tool to investigate and interpret tissue-specific experimental data. In this Review, we first provide a brief introduction to key single-cell and whole-tissue approaches that allow investigation of tissue specificity and then highlight two classes of machine-learning-based methods, which can be applied to analyse, model and interpret these experimental data. Deep learning methods can predict tissue-dependent effects of individual mutations on gene expression, alternative splicing and disease phenotypes. Network-based approaches can capture relationships between biomolecules, integrate large heterogeneous data compendia to model molecular circuits and identify tissue-specific functional relationships and regulatory connections. We conclude with an outlook to future possibilities in examining multicellular complexity by combining high-resolution, large-scale multiomics data sets and interpretable machine learning models.
CROTON: an automated and variant-aware deep learning framework for predicting CRISPR/Cas9 editing outcomes
CRISPR/Cas9 is a revolutionary gene-editing technology that has been widely utilized in biology, biotechnology and medicine. CRISPR/Cas9 editing outcomes depend on local DNA sequences at the target site and are thus predictable. However, existing prediction methods are dependent on both feature and model engineering, which restricts their performance to existing knowledge about CRISPR/Cas9 editing
- Previous Page
- Next Page