CCB: Publications

Designing Peptides on a Quantum Computer

V. Mulligan, H Melo, H Merritt, S Slocum, B Weitzner, A Watkins, D. Renfrew, C Pelissier, P Arora, R. Bonneau

Although a wide variety of quantum computers are currently being developed, actual computational results have been largely restricted to contrived, artificial tasks. Finding ways to apply quantum computers to useful, real-world computational tasks remains an active research area. Here we describe our mapping of the protein design problem to the D-Wave quantum annealer. We present a system whereby Rosetta, a state-of-the-art protein design software suite, interfaces with the D-Wave quantum processing unit to find amino acid side chain identities and conformations to stabilize a fixed protein backbone. Our approach, which we call the QPacker, uses a large side-chain rotamer library and the full Rosetta energy function, and in no way reduces the design task to a simpler format. We demonstrate that quantum annealer-based design can be applied to complex real-world design tasks, producing designed molecules comparable to those produced by widely adopted classical design approaches. We also show through large-scale classical folding simulations that the results produced on the quantum annealer can inform wet-lab experiments. For design tasks that scale exponentially on classical computers, the QPacker achieves nearly constant runtime performance over the range of problem sizes that could be tested. We anticipate better than classical performance scaling as quantum computers mature.

Show Abstract

Metabolome-Informed Microbiome Analysis Refines Metadata Classifications and Reveals Unexpected Medication Transfer in Captive Cheetahs

J. Gauglitz, J. Morton, A. Tripathi, S. Hansen, M. Gaffney, C. Carpenter, K. Weldon, R. Shah, A. Parampil, A. Fidgett, A. Swafford, R. Knight, P. Dorrenstein

Topological defects determine the structure and function of physical and biological matter over a wide range of scales, from the turbulent vortices in planetary atmospheres, oceans or quantum fluids to bioelectrical signalling in the heart1,2,3 and brain4, and cell death5. Many advances have been made in understanding and controlling the defect dynamics in active6,7,8,9 and passive9,10 non-equilibrium fluids. Yet, it remains unknown whether the statistical laws that govern the dynamics of defects in classical11 or quantum fluids12,13,14 extend to the active matter7,15,16 and information flows17,18 in living systems. Here, we show that a defect-mediated turbulence underlies the complex wave propagation patterns of Rho-GTP signalling protein on the membrane of starfish egg cells, a process relevant to cytoskeletal remodelling and cell proliferation19,20. Our experiments reveal that the phase velocity field extracted from Rho-GTP concentration waves exhibits vortical defect motions and annihilation dynamics reminiscent of those seen in quantum systems12,13, bacterial turbulence15 and active nematics7. Several key statistics and scaling laws of the defect dynamics can be captured by a minimal Helmholtz–Onsager point vortex model21 as well as a generic complex Ginzburg–Landau22 continuum theory, suggesting a close correspondence between the biochemical signal propagation on the surface of a living cell and a widely studied class of two-dimensional turbulence23 and wave22 phenomena.

Show Abstract

Optimal tuning of weighted kNN- and diffusion-based methods for denoising single cell genomics data

A Tjärnberg, O Mahmood, C Jackson, G Saldi, K Cho, L Christiaen, R. Bonneau

The analysis of single-cell genomics data presents several statistical challenges, and extensive efforts have been made to produce methods for the analysis of this data that impute missing values, address sampling issues and quantify and correct for noise. In spite of such efforts, no consensus on best practices has been established and all current approaches vary substantially based on the available data and empirical tests. The k-Nearest Neighbor Graph (kNN-G) is often used to infer the identities of, and relationships between, cells and is the basis of many widely used dimensionality-reduction and projection methods. The kNN-G has also been the basis for imputation methods using, e.g., neighbor averaging and graph diffusion. However, due to the lack of an agreed-upon optimal objective function for choosing hyperparameters, these methods tend to oversmooth data, thereby resulting in a loss of information with regard to cell identity and the specific gene-to-gene patterns underlying regulatory mechanisms. In this paper, we investigate the tuning of kNN- and diffusion-based denoising methods with a novel non-stochastic method for optimally preserving biologically relevant informative variance in single-cell data. The framework, Denoising Expression data with a Weighted Affinity Kernel and Self-Supervision (DEWÄKSS), uses a self-supervised technique to tune its parameters. We demonstrate that denoising with optimal parameters selected by our objective function (i) is robust to preprocessing methods using data from established benchmarks, (ii) disentangles cellular identity and maintains robust clusters over dimension-reduction methods, (iii) maintains variance along several expression dimensions, unlike previous heuristic-based methods that tend to oversmooth data variance, and (iv) rarely involves diffusion but rather uses a fixed weighted kNN graph for denoising. Together, these findings provide a new understanding of kNN- and diffusion-based denoising methods and serve as a foundation for future research. Code and example data for DEWÄKSS is available at https://gitlab.com/Xparx/dewakss/-/tree/Tjarnberg2020branch.

Show Abstract

A Bayesian nonparametric approach to super-resolution single-molecule localization

M. Gabitto, H. Marie-Nelly, A. Pakman, A. Pataki, X. Darzacq, M. Jordan

We consider the problem of single-molecule identification in super-resolution microscopy. Super-resolution microscopy overcomes the diffraction limit by localizing individual fluorescing molecules in a field of view. This is particularly difficult since each individual molecule appears and disappears randomly across time and because the total number of molecules in the field of view is unknown. Additionally, data sets acquired with super-resolution microscopes can contain a large number of spurious fluorescent fluctuations caused by background noise.

To address these problems, we present a Bayesian nonparametric framework capable of identifying individual emitting molecules in super-resolved time series. We tackle the localization problem in the case in which each individual molecule is already localized in space. First, we collapse observations in time and develop a fast algorithm that builds upon the Dirichlet process. Next, we augment the model to account for the temporal aspect of fluorophore photo-physics. Finally, we assess the performance of our methods with ground-truth data sets having known biological structure.

Show Abstract

Characterizing chromatin landscape from aggregate and single-cell genomic assays using flexible duration modeling

M. Gabitto, A. Rasmussen, O Wapinksi, K Allaway, N. Carriero, G Fishell, R. Bonneau

ATAC-seq has become a leading technology for probing the chromatin landscape of single and aggregated cells. Distilling functional regions from ATAC-seq presents diverse analysis challenges. Methods commonly used to analyze chromatin accessibility datasets are adapted from algorithms designed to process different experimental technologies, disregarding the statistical and biological differences intrinsic to the ATAC-seq technology. Here, we present a Bayesian statistical approach that uses latent space models to better model accessible regions, termed ChromA. ChromA annotates chromatin landscape by integrating information from replicates, producing a consensus de-noised annotation of chromatin accessibility. ChromA can analyze single cell ATAC-seq data, correcting many biases generated by the sparse sampling inherent in single cell technologies. We validate ChromA on multiple technologies and biological systems, including mouse and human immune cells, establishing ChromA as a top performing general platform for mapping the chromatin landscape in different cellular populations from diverse experimental designs.

Show Abstract

Gene regulatory network reconstruction using single-cell RNA sequencing of barcoded genotypes in diverse environments

C Jackson, D Castro, G Saldi, R. Bonneau, D Gresham

Understanding how gene expression programs are controlled requires identifying regulatory relationships between transcription factors and target genes. Gene regulatory networks are typically constructed from gene expression data acquired following genetic perturbation or environmental stimulus. Single-cell RNA sequencing (scRNAseq) captures the gene expression state of thousands of individual cells in a single experiment, offering advantages in combinatorial experimental design, large numbers of independent measurements, and accessing the interaction between the cell cycle and environmental responses that is hidden by population-level analysis of gene expression. To leverage these advantages, we developed a method for scRNAseq in budding yeast (Saccharomyces cerevisiae). We pooled diverse transcriptionally barcoded gene deletion mutants in 11 different environmental conditions and determined their expression state by sequencing 38,285 individual cells. We benchmarked a framework for learning gene regulatory networks from scRNAseq data that incorporates multitask learning and constructed a global gene regulatory network comprising 12,228 interactions.

Show Abstract

A Comprehensive Map of the Monocyte-Derived Dendritic Cell Transcriptional Network Engaged upon Innate Sensing of HIV

J Johndon, N De Veaux, A Rives, X Lahaye, S Lucas, B Perot, M Luka, V Garcia-Paredes, L Amon, A. Watters, G Abdessalem, A Aderem, N Manel , D Littman, R. Bonneau, M Ménager

Transcriptional programming of the innate immune response is pivotal for host protection. However, the transcriptional mechanisms that link pathogen sensing with innate activation remain poorly understood. During HIV-1 infection, human dendritic cells (DCs) can detect the virus through an innate sensing pathway, leading to antiviral interferon and DC maturation. Here, we develop an iterative experimental and computational approach to map the HIV-1 innate response circuitry in monocyte-derived DCs (MDDCs). By integrating genome-wide chromatin accessibility with expression kinetics, we infer a gene regulatory network that links 542 transcription factors with 21,862 target genes. We observe that an interferon response is required, yet insufficient, to drive MDDC maturation and identify PRDM1 and RARA as essential regulators of the interferon response and MDDC maturation, respectively. Our work provides a resource for interrogation of regulators of HIV replication and innate immunity, highlighting complexity and cooperativity in the regulatory circuit controlling the response to infection.

Show Abstract

Temporal probabilistic modeling of bacterial compositions derived from 16S rRNA sequencing

Tarmo Äijö, C. Müller, R. Bonneau

The number of microbial and metagenomic studies has increased drastically due to advancements in next-generation sequencing-based measurement techniques. Statistical analysis and the validity of conclusions drawn from (time series) 16S rRNA and other metagenomic sequencing data is hampered by the presence of significant amount of noise and missing data (sampling zeros). Accounting uncertainty in microbiome data is often challenging due to the difficulty of obtaining biological replicates. Additionally, the compositional nature of current amplicon and metagenomic data differs from many other biological data types adding another challenge to the data analysis. To address these challenges in human microbiome research, we introduce a novel probabilistic approach to explicitly model overdispersion and sampling zeros by considering the temporal correlation between nearby time points using Gaussian Processes. The proposed Temporal Gaussian Process Model for Compositional Data Analysis (TGP-CODA) shows superior modeling performance compared to commonly used Dirichlet-multinomial, multinomial, and non-parametric regression models on real and synthetic data. We demonstrate that the nonreplicative nature of human gut microbiota studies can be partially overcome by our method with proper experimental design of dense temporal sampling. We also show that different modeling approaches have a strong impact on ecological interpretation of the data, such as stationarity, persistence, and environmental noise models. A Stan implementation of the proposed method is available under MIT license at

Show Abstract

A single early-in-life macrolide course has lasting effects on murine microbial network topology and immunity

V Ruiz, T Battaglia, S Kurtz, L Bijnens, A Ou, I Engstrand, X Zheng, T Iizumi, B Mullins, C. Müller, K Cadwell, R. Bonneau, G Perez-Perez, M Blaser

Broad-spectrum antibiotics are frequently prescribed to children. Early childhood represents a dynamic period for the intestinal microbial ecosystem, which is readily shaped by environmental cues; antibiotic-induced disruption of this sensitive community may have long-lasting host consequences. Here we demonstrate that a single pulsed macrolide antibiotic treatment (PAT) course early in life is sufficient to lead to durable alterations to the murine intestinal microbiota, ileal gene expression, specific intestinal T-cell populations, and secretory IgA expression. A PAT-perturbed microbial community is necessary for host effects and sufficient to transfer delayed secretory IgA expression. Additionally, early-life antibiotic exposure has lasting and transferable effects on microbial community network topology. Our results indicate that a single early-life macrolide course can alter the microbiota and modulate host immune phenotypes that persist long after exposure has ceased.High or multiple doses of macrolide antibiotics, when given early in life, can perturb the metabolic and immunological development of lab mice. Here, Ruiz et al. show that even a single macrolide course, given early in life, leads to long-lasting changes in the gut microbiota and immune system of mice.

Show Abstract

Integrated Analysis of Biopsies from Inflammatory Bowel Disease Patients Identifies SAA1 as a Link Between Mucosal Microbes with TH17 and TH22 Cells

M Tang, R Bowcutt, J Leung, M Wolff, U Gundra, D Hudesman, L Malter, M Poles, L Chen, Z Pei, A Neto, W Abidi, T Ullman, L Mayer, R. Bonneau, P Loke

Background: Inflammatory bowel diseases (IBD) are believed to be driven by dysregulated interactions between the host and the gut microbiota. Our goal is to characterize and infer relationships between mucosal T cells, the host tissue environment, and microbial communities in patients with IBD who will serve as basis for mechanistic studies on human IBD.

Methods: We characterized mucosal CD4+ T cells using flow cytometry, along with matching mucosal global gene expression and microbial communities data from 35 pinch biopsy samples from patients with IBD. We analyzed these data sets using an integrated framework to identify predictors of inflammatory states and then reproduced some of the putative relationships formed among these predictors by analyzing data from the pediatric RISK cohort.

Results: We identified 26 predictors from our combined data set that were effective in distinguishing between regions of the intestine undergoing active inflammation and regions that were normal. Network analysis on these 26 predictors revealed SAA1 as the most connected node linking the abundance of the genus Bacteroides with the production of IL17 and IL22 by CD4+ T cells. These SAA1-linked microbial and transcriptome interactions were further reproduced with data from the pediatric IBD RISK cohort.

Conclusions: This study identifies expression of SAA1 as an important link between mucosal T cells, microbial communities, and their tissue environment in patients with IBD. A combination of T cell effector function data, gene expression and microbial profiling can distinguish between intestinal inflammatory states in IBD regardless of disease types.

Show Abstract