CCB: Publications

SARS-CoV-2 titers in wastewater foreshadow dynamics and clinical presentation of new COVID-19 cases

F Wu, A Xiao, J Zhang, K Moniz, N Endo, F Armas, R. Bonneau, M Brown, M Bushman, P Chai, C Duvallet, T Erickson, K Foppe, N Ghaeli, X Gu, W Hanage, K Huang, W Lee, M Matus, K McElroy, J Nagler, S Rhode, M Santillana, J Tucker, S Wuertz, S Zhao, J Thompson, E Alm

Current estimates of COVID-19 prevalence are largely based on symptomatic, clinically diagnosed cases. The existence of a large number of undiagnosed infections hampers population-wide investigation of viral circulation. Here, we use longitudinal wastewater analysis to track SARS-CoV-2 dynamics in wastewater at a major urban wastewater treatment facility in Massachusetts, between early January and May 2020. SARS-CoV-2 was first detected in wastewater on March 3. Viral titers in wastewater increased exponentially from mid-March to mid-April, after which they began to decline. Viral titers in wastewater correlated with clinically diagnosed new COVID-19 cases, with the trends appearing 4-10 days earlier in wastewater than in clinical data. We inferred viral shedding dynamics by modeling wastewater viral titers as a convolution of back-dated new clinical cases with the viral shedding function of an individual. The inferred viral shedding function showed an early peak, likely before symptom onset and clinical diagnosis, consistent with emerging clinical and experimental evidence. Finally, we found that wastewater viral titers at the neighborhood level correlate better with demographic variables than with population size. This work suggests that longitudinal wastewater analysis can be used to identify trends in disease transmission in advance of clinical case reporting, and may shed light on infection characteristics that are difficult to capture in clinical investigations, such as early viral shedding dynamics.

Show Abstract

High-Resolution Longitudinal Dynamics of the Cystic Fibrosis Sputum Microbiome and Metabolome through Antibiotic Therapy

R. Raghuvanshi, K. Vasco, Y. Vázquez-Baeza, L. Jiang, J. Morton, D. Li, A. Gonzalez, L. DeRight Goldasich, G. Humphrey, G. Ackerman, A. Swafford, D. Conrad, R. Knight, P. Dorrestein, R. Quinn

Microbial diversity in the cystic fibrosis (CF) lung decreases over decades as pathogenic bacteria such as Pseudomonas aeruginosa take over. The dynamics of the CF microbiome and metabolome over shorter time frames, however, remain poorly studied. Here, we analyze paired microbiome and metabolome data from 594 sputum samples collected over 401 days from six adult CF subjects (subject mean = 179 days) through periods of clinical stability and 11 CF pulmonary exacerbations (CFPE). While microbiome profiles were personalized (permutational multivariate analysis of variance [PERMANOVA] r2 = 0.79, P < 0.001), we observed significant intraindividual temporal variation that was highest during clinical stability (linear mixed-effects [LME] model, P = 0.002). This included periods where the microbiomes of different subjects became highly similar (UniFrac distance, <0.05). There was a linear increase in the microbiome alpha-diversity and in the log ratio of anaerobes to pathogens with time (n = 14 days) during the development of a CFPE (LME P = 0.0045 and P = 0.029, respectively). Collectively, comparing samples across disease states showed there was a reduction of these two measures during antibiotic treatment (LME P = 0.0096 and P = 0.014, respectively), but the stability data and CFPE data were not significantly different from each other. Metabolome alpha-diversity was higher during CFPE than during stability (LME P = 0.0085), but no consistent metabolite signatures of CFPE across subjects were identified. Virulence-associated metabolites from P. aeruginosa were temporally dynamic but were not associated with any disease state. One subject died during the collection period, enabling a detailed look at changes in the 194 days prior to death. This subject had over 90% Pseudomonas in the microbiome at the beginning of sampling, and that level gradually increased to over 99% prior to death. This study revealed that the CF microbiome and metabolome of some subjects are dynamic through time. Future work is needed to understand what drives these temporal dynamics and if reduction of anaerobes correlate to clinical response to CFPE therapy.

Show Abstract

Structure-Based Protein Function Prediction using Graph Convolutional Networks

V. Gligorijevic, D. Renfrew, T Kosciolek, J. Koehler, D. Berenberg, T Vatanen, C Chandler, B Taylor, I. Fisk, H Vlamakis, R Xavier, R Knight, K Cho, R. Bonneau

The large number of available sequences and the diversity of protein functions challenge current experimental and computational approaches to determining and predicting protein function. We present a deep learning Graph Convolutional Network (GCN) for predicting protein functions and concurrently identifying functionally important residues. This model is initially trained using experimentally determined structures from the Protein Data Bank (PDB) but has significant de-noising capability, with only a minor drop in performance observed when structure predictions are used. We take advantage of this denoising property to train the model on > 200,000 protein structures, including many homology-predicted structures, greatly expanding the reach and applications of the method. Our model learns general structure-function relationships by robustly predicting functions of proteins with ≤ 40% sequence identity to the training set. We show that our GCN architecture predicts functions more accurately than Convolutional Neural Networks trained on sequence data alone and previous competing methods. Using class activation mapping, we automatically identify structural regions at the residue-level that lead to each function prediction for every confidently predicted protein, advancing site-specific function prediction. We use our method to annotate PDB and SWISS-MODEL proteins, making several new confident function predictions spanning both fold and function classifications.

Show Abstract

Inference of Bacterial Small RNA Regulatory Networks and Integration with Transcription Factor-Driven Regulatory Networks

M Arrieta Ortiz, C Hafemeister, B Shuster, N Baliga, R. Bonneau

Small noncoding RNAs (sRNAs) are key regulators of bacterial gene expression. Through complementary base pairing, sRNAs affect mRNA stability and translation efficiency. Here, we describe a network inference approach designed to identify sRNA-mediated regulation of transcript levels. We use existing transcriptional data sets and prior knowledge to infer sRNA regulons using our network inference tool, the Inferelator. This approach produces genome-wide gene regulatory networks that include contributions by both transcription factors and sRNAs. We show the benefits of estimating and incorporating sRNA activities into network inference pipelines using available experimental data. We also demonstrate how these estimated sRNA regulatory activities can be mined to identify the experimental conditions where sRNAs are most active. We uncover 45 novel experimentally supported sRNAmRNA interactions in Escherichia coli, outperforming previous network-based efforts. Additionally, our pipeline complements sequence-based sRNA-mRNA interaction prediction methods by adding a data-driven filtering step. Finally, we show the general
applicability of our approach by identifying 24 novel, experimentally supported, sRNA-mRNA interactions in Pseudomonas aeruginosa, Staphylococcus aureus, and Bacillus subtilis. Overall, our strategy generates novel insights into the functional context of sRNA regulation in multiple bacterial species.

Show Abstract

Macromolecular modeling and design in Rosetta: recent methods and frameworks

J. Koehler, B Weitzner, R. Bonneau

Better together: Elements of successful scientific software development in a distributed collaborative community

J. Koehler, B Weitzner, D. Renfrew, S Lewis, R Moretti, A Watkins, V. Mulligan, S Lyskov, J Adolf-Bryfogle, J Labonte, J Krys, Rosetta Commons Consortium, W Schief, D Gront, O Schueler-Furman, D Baker, J Gray, R Dunbrack, T Kortemme, A Leaver-Fay, C Strauss, J Meiler, B Kuhlman, J Gray , R. Bonneau

Many scientific disciplines rely on computational methods for data analysis, model generation, and prediction. Implementing these methods is often accomplished by researchers with domain expertise but without formal training in software engineering or computer science. This arrangement has led to underappreciation of sustainability and maintainability of scientific software tools developed in academic environments. Some software tools have avoided this fate, including the scientific library Rosetta. We use this software and its community as a case study to show how modern software development can be accomplished successfully, irrespective of subject area. Rosetta is one of the largest software suites for macromolecular modeling, with 3.1 million lines of code and many state-of-the-art applications. Since the mid 1990s, the software has been developed collaboratively by the RosettaCommons, a community of academics from over 60 institutions worldwide with diverse backgrounds including chemistry, biology, physiology, physics, engineering, mathematics, and computer science. Developing this software suite has provided us with more than two decades of experience in how to effectively develop advanced scientific software in a global community with hundreds of contributors. Here we illustrate the functioning of this development community by addressing technical aspects (like version control, testing, and maintenance), community-building strategies, diversity efforts, software dissemination, and user support. We demonstrate how modern computational research can thrive in a distributed collaborative community. The practices described here are independent of subject area and can be readily adopted by other software development communities.

Show Abstract

Visualizing ’omic feature rankings and log-ratios using Qurro

M. Fedarko, C. Martino, J. Morton, A. González, G. Rahman, C. Marotz, J. Minich, E. Allen, R. Knight

Many tools for dealing with compositional ‘ ’omics’ data produce feature-wise values that can be ranked in order to describe features’ associations with some sort of variation. These values include differentials (which describe features’ associations with specified covariates) and feature loadings (which describe features’ associations with variation along a given axis in a biplot). Although prior work has discussed the use of these ‘rankings’ as a starting point for exploring the log-ratios of particularly high- or low-ranked features, such exploratory analyses have previously been done using custom code to visualize feature rankings and the log-ratios of interest. This approach is laborious, prone to errors and raises questions about reproducibility. To address these problems we introduce Qurro, a tool that interactively visualizes a plot of feature rankings (a ‘rank plot’) alongside a plot of selected features’ log-ratios within samples (a ‘sample plot’). Qurro’s interface includes various controls that allow users to select features from along the rank plot to compute a log-ratio; this action updates both the rank plot (through highlighting selected features) and the sample plot (through displaying the current log-ratios of samples). Here, we demonstrate how this unique interface helps users explore feature rankings and log-ratios simply and effectively.

Show Abstract

Shrinkage improves estimation of microbial associations under different normalization methods

M Badri, Z Kurtz, R. Bonneau, C. Müller

Consistent estimation of associations in microbial genomic survey count data is fundamental to microbiome research. Technical limitations, including compositionality, low sample sizes, and technical variability, obstruct standard application of association measures and require data normalization prior to estimating associations. Here, we investigate the interplay between data normalization and microbial association estimation by a comprehensive analysis of statistical consistency. Leveraging the large sample size of the American Gut Project (AGP), we assess the consistency of the two prominent linear association estimators, correlation and proportionality, under different sample scenarios and data normalization schemes, including RNA-seq analysis work flows and log-ratio transformations. We show that shrinkage estimation, a standard technique in high-dimensional statistics, can universally improve the quality of association estimates for microbiome data. We find that large-scale association patterns in the AGP data can be grouped into five normalization-dependent classes. Using microbial association network construction and clustering as examples of exploratory data analysis, we show that variance-stabilizing and log-ratio approaches provide for the most consistent estimation of taxonomic and structural coherence. Taken together, the findings from our reproducible analysis workflow have important implications for microbiome studies in multiple stages of analysis, particularly when only small sample sizes are available.

Show Abstract

Classification of the Molecular Defects Associated with Pathogenic Variants of the SLC6A8 Creatine Transporter

M Salazar, N Zelt, R Saldivar, C Kuntz, S Chen, W Penn, R. Bonneau, J. Koehler, J Schlebach

More than 80 loss-of-function (LOF) mutations in the SLC6A8 creatine transporter (hCRT1) are responsible for cerebral creatine deficiency syndrome (CCDS), which gives rise to a spectrum of neurological defects, including intellectual disability, epilepsy, and autism spectrum disorder. To gain insight into the nature of the molecular defects caused by these mutations, we quantitatively profiled the cellular processing, trafficking, expression, and function of eight pathogenic CCDS variants in relation to the wild type (WT) and one neutral isoform. All eight CCDS variants exhibit measurable proteostatic deficiencies that likely contribute to the observed LOF. However, the magnitudes of their specific effects on the expression and trafficking of hCRT1 vary considerably, and we find that the LOF associated with two of these variants primarily arises from the disruption of the substrate-binding pocket. In conjunction with an analysis of structural models of the transporter, we use these data to suggest mechanistic classifications for these variants. To evaluate potential avenues for therapeutic intervention, we assessed the sensitivity of these variants to temperature and measured their response to the proteostasis regulator 4-phenylbutyrate (4-PBA). Only one of the tested variants (G132V) is sensitive to temperature, though its response to 4-PBA is negligible. Nevertheless, 4-PBA significantly enhances the activity of WT hCRT1 in HEK293T cells, which suggests it may be worth evaluating as a therapeutic for female intellectual disability patients carrying a single CCDS mutation. Together, these findings reveal that pathogenic SLC6A8 mutations cause a spectrum of molecular defects that should be taken into consideration in future efforts to develop CCDS therapeutics.

Show Abstract

Characterization of antibiotic resistance and host-microbiome interactions in the human upper respiratory tract during influenza infection

L Zhang, C Forst, A Gordon, G Gussin, A Gerber , P Fernandez , T Ding, L Lashua, M Wang, A Balmaseda, R. Bonneau, B Zhang, E Ghedin

Background
The abundance and diversity of antibiotic resistance genes (ARGs) in the human respiratory microbiome remain poorly characterized. In the context of influenza virus infection, interactions between the virus, the host, and resident bacteria with pathogenic potential are known to complicate and worsen disease, resulting in coinfection and increased morbidity and mortality of infected individuals. When pathogenic bacteria acquire antibiotic resistance, they are more difficult to treat and of global health concern. Characterization of ARG expression in the upper respiratory tract could help better understand the role antibiotic resistance plays in the pathogenesis of influenza-associated bacterial secondary infection.

Results
Thirty-seven individuals participating in the Household Influenza Transmission Study (HITS) in Managua, Nicaragua, were selected for this study. We performed metatranscriptomics and 16S rRNA gene sequencing analyses on nasal and throat swab samples, and host transcriptome profiling on blood samples. Individuals clustered into two groups based on their microbial gene expression profiles, with several microbial pathways enriched with genes differentially expressed between groups. We also analyzed antibiotic resistance gene expression and determined that approximately 25% of the sequence reads that corresponded to antibiotic resistance genes mapped to Streptococcus pneumoniae and Staphylococcus aureus. Following construction of an integrated network of ARG expression with host gene co-expression, we identified several host key regulators involved in the host response to influenza virus and bacterial infections, and host gene pathways associated with specific antibiotic resistance genes.

Conclusions
This study indicates the host response to influenza infection could indirectly affect antibiotic resistance gene expression in the respiratory tract by impacting the microbial community structure and overall microbial gene expression. Interactions between the host systemic responses to influenza infection and antibiotic resistance gene expression highlight the importance of viral-bacterial co-infection in acute respiratory infections like influenza.

Show Abstract