159 Publications

Interpretable neural architecture search and transfer learning for understanding CRISPR–Cas9 off-target enzymatic reactions

Finely-tuned enzymatic pathways control cellular processes, and their dysregulation can lead to disease. Creating predictive and interpretable models for these pathways is challenging because of the complexity of the pathways and of the cellular and genomic contexts. Here we introduce Elektrum, a deep learning framework which addresses these challenges with data-driven and biophysically interpretable models for determining the kinetics of biochemical systems. First, it uses in vitro kinetic assays to rapidly hypothesize an ensemble of high-quality Kinetically Interpretable Neural Networks (KINNs) that predict reaction rates. It then employs a novel transfer learning step, where the KINNs are inserted as intermediary layers into deeper convolutional neural networks, fine-tuning the predictions for reaction-dependent in vivo outcomes. Elektrum makes effective use of the limited, but clean in vitro data and the complex, yet plentiful in vivo data that captures cellular context. We apply Elektrum to predict CRISPR-Cas9 off-target editing probabilities and demonstrate that Elektrum achieves state-of-the-art performance, regularizes neural network architectures, and maintains physical interpretability

Show Abstract

Atlas of primary cell-type-specific sequence models of gene expression and variant effects

Ksenia Sokolova , Chandra L. Theesfeld, A. Wong, O. Troyanskaya, et al.

Human biology is rooted in highly specialized cell types programmed by a common genome, 98% of which is outside of genes. Genetic variation in the enormous noncoding space is linked to the majority of disease risk. To address the problem of linking these variants to expression changes in primary human cells, we introduce ExPectoSC, an atlas of modular deep-learning-based models for predicting cell-type-specific gene expression directly from sequence. We provide models for 105 primary human cell types covering 7 organ systems, demonstrate their accuracy, and then apply them to prioritize relevant cell types for complex human diseases. The resulting atlas of sequence-based gene expression and variant effects is publicly available in a user-friendly interface and readily extensible to any primary cell types. We demonstrate the accuracy of our approach through systematic evaluations and apply the models to prioritize ClinVar clinical variants of uncertain significance, verifying our top predictions experimentally.

Show Abstract

Mapping disease regulatory circuits at cell-type resolution from single-cell multiomics data

Resolving chromatin remodeling-linked gene expression changes at cell type resolution is important for understanding disease states. We describe MAGICAL, a hierarchical Bayesian approach that leverages paired scRNA-seq and scATAC-seq data from different conditions to map disease-associated transcription factors, chromatin sites, and genes as regulatory circuits. By simultaneously modeling signal variation across cells and conditions in both omics data types, MAGICAL achieved high accuracy on circuit inference. We applied MAGICAL to study Staphylococcus aureus sepsis from peripheral blood mononuclear single-cell data that we generated from infected subjects with bloodstream infection and from uninfected controls. MAGICAL identified sepsis-associated regulatory circuits predominantly in CD14 monocytes, known to be activated by bacterial sepsis. We addressed the challenging problem of distinguishing host regulatory circuit responses to methicillin-resistant-(MRSA) and methicillin-susceptible Staphylococcus aureus (MSSA) infections. While differential expression analysis failed to show predictive value, MAGICAL identified epigenetic circuit biomarkers that distinguished MRSA from MSSA.

Show Abstract
June 6, 2023

Cultured Renal Proximal Tubular Epithelial Cells Resemble a Stressed/Damaged Kidney While Supporting BK Virus Infection

Ping An, Maria Teresa Sáenz Robles, R. Sealfon, et al

BK virus (BKV; human polyomavirus 1) infections are asymptomatic in most individuals, and the virus persists throughout life without harm. However, BKV is a threat to transplant patients and those with immunosuppressive disorders. Under these circumstances, the virus can replicate robustly in proximal tubule epithelial cells (PT). Cultured renal proximal tubule epithelial cells (RPTE) are permissive to BKV and have been used extensively to characterize different aspects of BKV infection. Recently, lines of hTERT-immortalized RPTE have become available, and preliminary studies indicate they support BKV infection as well. Our results indicate that BKV infection leads to a similar response in primary and immortalized RPTE. In addition, we examined the patterns of global gene expression of primary and immortalized RPTE and compared them with uncultured PT freshly dissociated from human kidney. As expected, PT isolated from the healthy kidney express a number of differentiation-specific genes that are associated with kidney function. However, the expression of most of these genes is absent or repressed in cultured RPTE. Rather, cultured RPTE exhibit a gene expression profile indicative of a stressed or injured kidney. Inoculation of cultured RPTE with BKV results in the suppression of many genes associated with kidney stress. In summary, this study demonstrated similar global gene expression patterns and responses to BKV infection between primary and immortalized RPTE. Moreover, results from bulk transcriptome sequencing (RNA-seq) and SCT experiments revealed distinct transcriptomic signatures representing cell injury and stress in primary RPTE in contrast to the uncultured, freshly dissociated PT from human kidney.

Show Abstract

A methylation clock model of mild SARS-CoV-2 infection provides insight into immune dysregulation

DNA methylation comprises a cumulative record of lifetime exposures superimposed on genetically determined markers. Little is known about methylation dynamics in humans following an acute perturbation, such as infection. We characterized the temporal trajectory of blood epigenetic remodeling in 133 participants in a prospective study of young adults before, during, and after asymptomatic and mildly symptomatic SARS-CoV-2 infection. The differential methylation caused by asymptomatic or mildly symptomatic infections was indistinguishable. While differential gene expression largely returned to baseline levels after the virus became undetectable, some differentially methylated sites persisted for months of follow-up, with a pattern resembling autoimmune or inflammatory disease. We leveraged these responses to construct methylation-based machine learning models that distinguished samples from pre-, during-, and postinfection time periods, and quantitatively predicted the time since infection. The clinical trajectory in the young adults and in a diverse cohort with more severe outcomes was predicted by the similarity of methylation before or early after SARS-CoV-2 infection to the model-defined postinfection state. Unlike the phenomenon of trained immunity, the postacute SARS-CoV-2 epigenetic landscape we identify is antiprotective.

Show Abstract

Blood RNA alternative splicing events as diagnostic biomarkers for infectious disease

Assays detecting blood transcriptome changes are studied for infectious disease diagnosis. Blood-based RNA alternative splicing (AS) events, which have not been well characterized in pathogen infection, have potential normalization and assay platform stability advantages over gene expression for diagnosis. Here, we present a computational framework for developing AS diagnostic biomarkers. Leveraging a large prospective cohort of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection and whole-blood RNA sequencing (RNA-seq) data, we identify a major functional AS program switch upon viral infection. Using an independent cohort, we demonstrate the improved accuracy of AS biomarkers for SARS-CoV-2 diagnosis compared with six reported transcriptome signatures. We then optimize a subset of AS-based biomarkers to develop microfluidic PCR diagnostic assays. This assay achieves nearly perfect test accuracy (61/62 = 98.4 percent) using a naive principal component classifier, significantly more accurate than a gene expression PCR assay in the same cohort. Therefore, our RNA splicing computational framework enables a promising avenue for host-response diagnosis of infection.

Show Abstract

Oral mucosal breaks trigger anti-citrullinated bacterial and human protein antibody responses in rheumatoid arthritis

R. CAMILLE BREWER , TOBIAS V. LANZ , O. Troyanskaya, et al

Periodontal disease is more common in individuals with rheumatoid arthritis (RA) who have detectable anti-citrullinated protein antibodies (ACPAs), implicating oral mucosal inflammation in RA pathogenesis. Here, we performed paired analysis of human and bacterial transcriptomics in longitudinal blood samples from RA patients. We found that patients with RA and periodontal disease experienced repeated oral bacteremias associated with transcriptional signatures of ISG15+HLADRhi and CD48highS100A2pos monocytes, recently identified in inflamed RA synovia and blood of those with RA flares. The oral bacteria observed transiently in blood were broadly citrullinated in the mouth, and their in situ citrullinated epitopes were targeted by extensively somatically hypermutated ACPAs encoded by RA blood plasmablasts. Together, these results suggest that (i) periodontal disease results in repeated breaches of the oral mucosa that release citrullinated oral bacteria into circulation, which (ii) activate inflammatory monocyte subsets that are observed in inflamed RA synovia and blood of RA patients with flares and (iii) activate ACPA B cells, thereby promoting affinity maturation and epitope spreading to citrullinated human antigens.

Show Abstract

SARS-CoV-2 Outbreak Dynamics in an Isolated US Military Recruit Training Center With Rigorous Prevention Measures

Rhonda A. Lizewski, R. Sealfon, O. Troyanskaya, et al.

Marine recruits training at Parris Island experienced an unexpectedly high rate of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection, despite preventive measures including a supervised, 2-week, pre-entry quarantine. We characterize SARS-CoV-2 transmission in this cohort.

Show Abstract

Identifying genes and pathways linking astrocyte regional specificity to Alzheimer’s disease susceptibility

Ran Zhang , Margarete Knudsen, O. Troyanskaya, et al.

Astrocytes have been shown to play a central role in Alzheimer’s Disease (AD). However, the genes and biological pathways underlying disease manifestation are unknown, and it is unclear whether regional molecular differences among astrocytes contribute to regional specificity of disease. Here, we began to address these challenges with integrated experimental and computational approaches. We constructed a human astrocyte-specific functional gene network using Bayesian integration of a large compendium of human functional genomics data, as well as regional astrocyte gene expression profiles we generated in the mouse. This network identifies likely region-specific astrocyte pathways that operate in healthy brains. We leveraged our findings to compile genome-wide astrocyte-associated disease-gene predictions, employing a novel network-guided differential expression analysis (NetDIFF). We also used this data to predict a list of astrocyte-expressed genes mediating region-specific human disease, using a network-guided shortest path method (NetPATH). Both the network and our results are publicly available using an interactive web interface at http://astrocyte.princeton.edu. Our experimental and computational studies propose a strategy for disease gene and pathway prediction that may be applied to a host of human neurological disorders.

Show Abstract

Pre-infection antiviral innate immunity contributes to sex differences in SARS-CoV-2 infection

Male sex is a major risk factor for SARS-CoV-2 infection severity. To understand the basis for this sex difference, we studied SARS-CoV-2 infection in a young adult cohort of United States Marine recruits. Among 2,641 male and 244 female unvaccinated and seronegative recruits studied longitudinally, SARS-CoV-2 infections occurred in 1,033 males and 137 females. We identified sex differences in symptoms, viral load, blood transcriptome, RNA splicing, and proteomic signatures. Females had higher pre-infection expression of antiviral interferon-stimulated gene (ISG) programs. Causal mediation analysis implicated ISG differences in number of symptoms, levels of ISGs, and differential splicing of CD45 lymphocyte phosphatase during infection. Our results indicate that the antiviral innate immunity set point causally contributes to sex differences in response to SARS-CoV-2 infection. A record of this paper’s transparent peer review process is included in the supplemental information.

Show Abstract
November 1, 2022
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates