SARS-CoV-2 RNA concentrations in wastewater foreshadow dynamics and clinical presentation of new COVID-19 cases
Current estimates of COVID-19 prevalence are largely based on symptomatic, clinically diagnosed cases. The existence of a large number of undiagnosed infections hampers population-wide investigation of viral circulation. Here, we quantify the SARS-CoV-2 concentration and track its dynamics in wastewater at a major urban wastewater treatment facility in Massachusetts, between early January and May 2020. SARS-CoV-2 was first detected in wastewater on March 3. SARS-CoV-2 RNA concentrations in wastewater correlated with clinically diagnosed new COVID-19 cases, with the trends appearing 4–10 days earlier in wastewater than in clinical data. We inferred viral shedding dynamics by modeling wastewater viral load as a convolution of back-dated new clinical cases with the average population-level viral shedding function. The inferred viral shedding function showed an early peak, likely before symptom onset and clinical diagnosis, consistent with emerging clinical and experimental evidence. This finding suggests that SARS-CoV-2 concentrations in wastewater may be primarily driven by viral shedding early in infection. This work shows that longitudinal wastewater analysis can be used to identify trends in disease transmission in advance of clinical case reporting, and infer early viral shedding dynamics for newly infected individuals, which are difficult to capture in clinical investigations.
One of the hallmarks of the cerebral cortex is the extreme diversity of interneurons. The two largest subtypes of cortical interneurons, parvalbumin- and somatostatin-positive cells, are morphologically and functionally distinct in adulthood but arise from common lineages within the medial ganglionic eminence.This makes them an attractive model for studying the generation of cell diversity. Here we examine how developmental changes in transcription and chromatin structure enable these cells to acquire distinct identities in the mouse cortex. Generic interneuron features are first detected upon cell cycle exit through the opening of chromatin at distal elements. By constructing cell-type-specific gene regulatory networks, we observed that parvalbumin- and somatostatin-positive cells initiate distinct programs upon settling within the cortex. We used these networks to model the differential transcriptional requirement of a shared regulator, Mef2c, and confirmed the accuracy of our predictions through experimental loss-of-function experiments. We therefore reveal how a common molecular program diverges to enable these neuronal subtypes to acquire highly specialized properties by adulthood. Our methods provide a framework for examining the emergence of cellular diversity, as well as for quantifying and predicting the effect of candidate genes on cell-type-specific development.
In October of 2020, in response to the Coronavirus Disease 2019 (COVID-19) pandemic, our team hosted our first fully online workshop teaching the QIIME 2 microbiome bioinformatics platform. We had 75 enrolled participants who joined from at least 25 different countries on 6 continents, and we had 22 instructors on 4 continents. In the 5-day workshop, participants worked hands-on with a cloud-based shared compute cluster that we deployed for this course. The event was well received, and participants provided feedback and suggestions in a postworkshop questionnaire. In January of 2021, we followed this workshop with a second fully online workshop, incorporating lessons from the first. Here, we present details on the technology and protocols that we used to run these workshops, focusing on the first workshop and then introducing changes made for the second workshop. We discuss what worked well, what didn’t work well, and what we plan to do differently in future workshops.
Anchor extension: a structure-guided approach to design cyclic peptides targeting enzyme active sites
Despite recent success in computational design of structured cyclic peptides, de novo design of cyclic peptides that bind to any protein functional site remains difficult. To address this challenge, we develop a computational “anchor extension” methodology for targeting protein interfaces by extending a peptide chain around a non-canonical amino acid residue anchor. To test our approach using a well characterized model system, we design cyclic peptides that inhibit histone deacetylases 2 and 6 (HDAC2 and HDAC6) with enhanced potency compared to the original anchor (IC50 values of 9.1 and 4.4 nM for the best binders compared to 5.4 and 0.6 µM for the anchor, respectively). The HDAC6 inhibitor is among the most potent reported so far. These results highlight the potential for de novo design of high-affinity protein-peptide interfaces, as well as the challenges that remain..
Gene regulatory networks define regulatory relationships between transcription factors and target genes within a biological system, and reconstructing them is essential for understanding cellular growth and function. In this work, we present the Inferelator 3.0, which has been significantly updated to integrate data from distinct cell types to learn context-specific regulatory networks and aggregate them into a shared regulatory network, while retaining the functionality of the previous versions. The Inferelator 3.0 reliably learns informative networks from the model organisms Bacillus subtilis and Saccharomyces cerevisiae. We demonstrate its capabilities by learning networks for multiple distinct neuronal and glial cell types in the developing Mus musculus brain at E18 from a large (1.3 million) single-cell gene expression data set with paired single-cell chromatin accessibility data.
As non-“self” macromolecules, biotherapeutics can trigger an immune response that can reduce drug efficacy, require patients to be taken off therapy, or even cause life-threatening reactions. To enable the flexible and facile design of protein biotherapeutics while reducing the prevalence of T-cell epitopes that drive immune recognition, we have integrated into the Rosetta protein design suite a new scoring term that allows design protocols to account for predicted or experimentally identified epitopes in the optimized objective function. This flexible scoring term can be used in any Rosetta design trajectory, can be targeted to specific regions of a protein, and can be readily extended to work with a variety of epitope predictors. By performing extensive design runs with varied design parameter choices for three case study proteins as well as a larger diverse benchmark, we show that the incorporation of this scoring term enables the effective exploration of an alternative, deimmunized sequence space to discover diverse proteins that are potentially highly deimmunized while retaining physical and chemical qualities similar to those yielded by equivalent nondeimmunizing sequence design protocols.
Ensuring scientific reproducibility in bio-macromolecular modeling via extensive, automated benchmarks
Each year vast international resources are wasted on irreproducible research. The scientific community has been slow to adopt standard software engineering practices, despite the increases in high-dimensional data, complexities of workflows, and computational environments. Here we show how scientific software applications can be created in a reproducible manner when simple design goals for reproducibility are met. We describe the implementation of a test server framework and 40 scientific benchmarks, covering numerous applications in Rosetta bio-macromolecular modeling. High performance computing cluster integration allows these benchmarks to run continuously and automatically. Detailed protocol captures are useful for developers and users of Rosetta and other macromolecular modeling tools. The framework and design concepts presented here are valuable for developers and users of any type of scientific software and for the scientific community to create reproducible methods. Specific examples highlight the utility of this framework and the comprehensive documentation illustrates the ease of adding new tests in a matter of hours.
Constrained non-negative matrix factorization enabling real-time insights of in situ and high-throughput experiments
Non-negative Matrix Factorization (NMF) methods offer an appealing unsupervised learning method for real-time analysis of streaming spectral data in time-sensitive data collection, such as in situ characterization of materials. However, canonical NMF methods are optimized to reconstruct a full dataset as closely as possible, with no underlying requirement that the reconstruction produces components or weights representative of the true physical processes. In this work, we demonstrate how constraining NMF weights or components, provided as known or assumed priors, can provide significant improvement in revealing true underlying phenomena. We present a PyTorch based method for efficiently applying constrained NMF and demonstrate this on several synthetic examples. When applied to streaming experimentally measured spectral data, an expert researcher-in-the-loop can provide and dynamically adjust the constraints. This set of interactive priors to the NMF model can, for example, contain known or identified independent components, as well as functional expectations about the mixing of components. We demonstrate this application on measured X-ray diffraction and pair distribution function data from in situ beamline experiments. Details of the method are described, and general guidance provided to employ constrained NMF in extraction of critical information and insights during in situ and high-throughput experiments.
Standard workflows for analyzing microbiomes often include the creation and curation of phylogenetic trees. Here we present EMPress, an interactive web tool for visualizing trees in the context of microbiome, metabolome, and other community data scalable to trees with well over 500,000 nodes. EMPress provides novel functionality—including ordination integration and animations—alongside many standard tree visualization features and thus simplifies exploratory analyses of many forms of ‘omic data.
Quantifying Live Microbial Load in Human Saliva Samples over Time Reveals Stable Composition and Dynamic Load
Evaluating microbial community composition through next-generation sequencing has become increasingly accessible. However, metagenomic sequencing data sets provide researchers with only a snapshot of a dynamic ecosystem and do not provide information about the total microbial number, or load, of a sample. Additionally, DNA can be detected long after a microorganism is dead, making it unsafe to assume that all microbial sequences detected in a community came from living organisms. By combining relic DNA removal by propidium monoazide (PMA) with microbial quantification with flow cytometry, we present a novel workflow to quantify live microbial load in parallel with metagenomic sequencing. We applied this method to unstimulated saliva samples, which can easily be collected longitudinally and standardized by passive collection time. We found that the number of live microorganisms detected in saliva was inversely correlated with salivary flow rate and fluctuated by an order of magnitude throughout the day in healthy individuals. In an acute perturbation experiment, alcohol-free mouthwash resulted in a massive decrease in live bacteria, which would have been missed if we did not consider dead cell signal. While removing relic DNA from saliva samples did not greatly impact the microbial composition, it did increase our resolution among samples collected over time. These results provide novel insight into the dynamic nature of host-associated microbiomes and underline the importance of applying scale-invariant tools in the analysis of next-generation sequencing data sets.
- Previous Page
- Next Page