Publications

c-Maf-dependent regulatory T cells mediate immunological tolerance to intestinal microbiota

M Xu, M Pokrovskii, Y Ding, R Yi, C Au, C Galan, R. Bonneau

Both microbial and host genetic factors contribute to the pathogenesis of autoimmune disease1-4. Accumulating evidence suggests that microbial species that potentiate chronic inflammation, as in inflammatory bowel disease (IBD), often also colonize healthy individuals. These microbes, including the Helicobacter species, have the propensity to induce autoreactive T cells and are collectively referred to as pathobionts4-8. However, an understanding of how such T cells are constrained in healthy individuals is lacking. Here we report that host tolerance to a potentially pathogenic bacterium, Helicobacter hepaticus (H. hepaticus), is mediated by induction of RORγt+Foxp3+ regulatory T cells (iTreg) that selectively restrain pro-inflammatory TH17 cells and whose function is dependent on the transcription factor c-Maf. Whereas H. hepaticus colonization of wild-type mice promoted differentiation of RORγt-expressing microbe-specific iTreg in the large intestine, in disease-susceptible IL-10-deficient animals there was instead expansion of colitogenic TH17 cells. Inactivation of c-Maf in the Treg compartment likewise impaired differentiation of bacteria-specific iTreg, resulting in accumulation of H. hepaticus-specific inflammatory TH17 cells and spontaneous colitis. In contrast, RORγt inactivation in Treg only had a minor effect on bacterial-specific Treg-TH17 balance, and did not result in inflammation. Our results suggest that pathobiont-dependent IBD is a consequence of microbiota-reactive T cells that have escaped this c-Maf-dependent mechanism of iTreg-TH17 homeostasis.

Show Abstract

An Adaptive Geometric Search Algorithm for Macromolecular Scaffold Selection

T Jiang, D. Renfrew, K Drew, N Youngs, G Butterfoss, D Shasha, R. Bonneau

A wide variety of protein and peptidomimetic design tasks require matching functional three-dimensional motifs to potential oligomeric scaffolds. Enzyme design, for example, aims to graft active-site patterns typically consisting of 3 to 15 residues onto new protein surfaces. Identifying suitable proteins capable of scaffolding such active-site engraftment requires costly searches to identify protein folds that can provide the correct positioning of side chains to host the desired active site. Other examples of biodesign tasks that require simpler fast exact geometric searches of potential side chain positioning include mimicking binding hotspots, design of metal binding clusters and the design of modular hydrogen binding networks for specificity. In these applications the speed and scaling of geometric search limits downstream design to small patterns. Here we present an adaptive algorithm to searching for side chain take-off angles compatible with an arbitrarily specified functional pattern that enjoys substantive performance improvements over previous methods. We demonstrate this method in both genetically encoded (protein) and synthetic (peptidomimetic) design scenarios. Examples of using this method with the Rosetta framework for protein design are provided but our implementation is compatible with multiple protein design frameworks and is freely available as a set of python scripts (https://github.com/JiangTian/adaptive- geometric-search-for-protein-design).

Show Abstract

Explicit Modeling of RNA Stability Improves Large-Scale Inference of Transcription Regulation

K Tchourine, C Vogel, R. Bonneau

Inference of eukaryotic transcription regulatory networks remains challenging due to the large number of regu- lators, combinatorial interactions, and redundant pathways. Even in the model system Saccharomyces cerevisiae, inference has performed poorly. Most existing inference algorithms ignore crucial regulatory components, like RNA stability and post-transcriptional modulation of regulators. Here we demonstrate that explicitly modeling tran- scription factor activity and RNA half-lives during inference of a genome-wide transcription regulatory network in yeast not only advances prediction performance, but also produces new insights into gene- and condition-specific variation of RNA stability. We curated a high quality gold standard reference network that we use for priors on network structure and model validation. We incorporate variation of RNA half-lives into the Inferelator inference framework, and show improved performance over previously described algorithms and over implementations of the algorithm that do not model RNA degradation. We recapitulate known condition- and gene-specific trends in RNA half-lives, and make new predictions about RNA half-lives that are confirmed by experimental data.

Show Abstract

Compressed sensing and optimal denoising of monotone signals

E. Pnevmatikakis

We consider the problems of compressed sensing and optimal denoising for signals $\mathbf{x_0}\in\mathbb{R}^N$ that are monotone, i.e., $\mathbf{x_0}(i+1) \geq \mathbf{x_0}(i)$, and sparsely varying, i.e., $\mathbf{x_0}(i+1) > \mathbf{x_0}(i)$ only for a small number $k$ of indices $i$. We approach the compressed sensing problem by minimizing the total variation norm restricted to the class of monotone signals subject to equality constraints obtained from a number of measurements $A\mathbf{x_0}$. For random Gaussian sensing matrices $A\in\mathbb{R}^{m\times N}$ we derive a closed form expression for the number of measurements $m$ required for successful reconstruction with high probability. We show that the probability undergoes a phase transition as $m$ varies, and depends not only on the number of change points, but also on their location. For denoising we regularize with the same norm and derive a formula for the optimal regularizer weight that depends only mildly on $\mathbf{x_0}$. We obtain our results using the statistical dimension tool.

Show Abstract

Forces positioning the mitotic spindle in the cell; Theories, and now experiments

H. Wu, M. Shelley, D.J. Needleman

The position of the spindle determines the position of the cleavage plane, and is thus crucial for cell division. Although spindle positioning has been extensively studied, the underlying forces ultimately responsible for moving the spindle remain poorly understood. A recent pioneering study by Garzon-Coral et al. uses magnetic tweezers to perform the first direct measurements of the forces involved in positioning the mitotic spindle. Combining this with molecular perturbations and geometrical effects, they use their data to argue that the forces that keep the spindle in its proper position for cell division arise from astral microtubules growing and pushing against the cell's cortex. Here, we review these ground-breaking experiments, the various biomechanical models for spindle positioning that they seek to differentiate, and discuss new questions raised by these measurements.

Show Abstract

Comoving stars in Gaia DR1: An abundance of very wide separation co-moving pairs

S. Oh, A.M. Price-Whelan, D. Hogg, T.D. Morton, D. Spergel

The primary sample of the {\it Gaia} Data Release 1 is the Tycho-Gaia Astrometric Solution (TGAS): ≈ 2 million Tycho-2 sources with improved parallaxes and proper motions relative to the initial catalog. This increased astrometric precision presents an opportunity to find new binary stars and moving groups. We search for high-confidence comoving pairs of stars in TGAS by identifying pairs of stars consistent with having the same 3D velocity using a marginalized likelihood ratio test to discriminate candidate comoving pairs from the field population. Although we perform some visualizations using (bias- corrected) inverse parallax as a point estimate of distance, the likelihood ratio is computed with a probabilistic model that includes the covariances of parallax and proper motions and marginalizes the (unknown) true distances and 3D velocities of the stars. We find 13,085 comoving star pairs among 10,606 unique stars with separations as large as 10 pc (our search limit). Some of these pairs form larger groups through mutual comoving neighbors: many of these pair networks correspond to known open clusters and OB associations, but we also report the discovery of several new comoving groups. Most surprisingly, we find a large number of very wide (>1 pc) separation comoving star pairs, the number of which increases with increasing separation and cannot be explained purely by false-positive contamination. Our key result is a catalog of high-confidence comoving pairs of stars in TGAS. We discuss the utility of this catalog for making dynamical inferences about the Galaxy, testing stellar atmosphere models, and validating chemical abundance measurements.

Show Abstract

Fused regression for multi-source gene regulatory network inference

K Lam, Z Westrick, C. Müller, L Christiaen, R. Bonneau

Understanding gene regulatory networks is critical to understanding cellular differentiation and response to external stimuli. Methods for global network inference have been developed and applied to a variety of species. Most approaches consider the problem of network inference independently in each species, despite evidence that gene regulation can be conserved even in distantly related species. Further, network inference is often confined to single data-types (single platforms) and single cell types. We introduce a method for multi-source network inference that allows simultaneous estimation of gene regulatory networks in multiple species or biological processes through the introduction of priors based on known gene relationships such as orthology incorporated using fused regression. This approach improves network inference performance even when orthology mapping and conservation are incomplete. We refine this method by presenting an algorithm that extracts the true conserved subnetwork from a larger set of potentially conserved interactions and demonstrate the utility of our method in cross species network inference. Last, we demonstrate our method’s utility in learning from data collected on different experimental platforms.

Show Abstract

Data-driven, interpretable photometric redshifts trained on heterogeneous and unrepresentative data

B. Leistedt, D. Hogg

We present a new method for inferring photometric redshifts in deep galaxy and quasar surveys, based on a data driven model of latent spectral energy distributions (SEDs) and a physical model of photometric fluxes as a function of redshift. This conceptually novel approach combines the advantages of both machine-learning and template-fitting methods by building template SEDs directly from the training data. This is made computationally tractable with Gaussian Processes operating in flux--redshift space, encoding the physics of redshift and the projection of galaxy SEDs onto photometric band passes. This method alleviates the need of acquiring representative training data or constructing detailed galaxy SED models; it requires only that the photometric band passes and calibrations be known or have parameterized unknowns. The training data can consist of a combination of spectroscopic and deep many-band photometric data, which do not need to entirely spatially overlap with the target survey of interest or even involve the same photometric bands. We showcase the method on the i-magnitude-selected, spectroscopically-confirmed galaxies in the COSMOS field. The model is trained on the deepest bands (from SUBARU and HST) and photometric redshifts are derived using the shallower SDSS optical bands only. We demonstrate that we obtain accurate redshift point estimates and probability distributions despite the training and target sets having very different redshift distributions, noise properties, and even photometric bands. Our model can also be used to predict missing photometric fluxes, or to simulate populations of galaxies with realistic fluxes and redshifts, for example. This method opens a new era in which photometric redshifts for large photometric surveys are derived using a flexible yet physical model of the data trained on all available surveys (spectroscopic and photometric).

Show Abstract

Bosonic self-energy functional theory

Dario Hügel, Philipp Werner, Lode Pollet, H. Strand

We derive the self-energy functional theory for bosonic lattice systems with broken U(1) symmetry by parametrizing the bosonic Baym-Kadanoff effective action in terms of one- and two-point self-energies. The formalism goes beyond other approximate methods such as the pseudoparticle variational cluster approximation, the cluster composite boson mapping, and the Bogoliubov+U theory. It simplifies to bosonic dynamical-mean-field theory when constraining to local fields, whereas when neglecting kinetic contributions of noncondensed bosons, it reduces to the static mean-field approximation. To benchmark the theory, we study the Bose-Hubbard model on the two- and three-dimensional cubic lattice, comparing with exact results from path integral quantum Monte Carlo. We also study the frustrated square lattice with next-nearest-neighbor hopping, which is beyond the reach of Monte Carlo simulations. A reference system comprising a single bosonic state, corresponding to three variational parameters, is sufficient to quantitatively describe phase boundaries and thermodynamical observables, while qualitatively capturing the spectral functions, as well as the enhancement of kinetic fluctuations in the frustrated case. On the basis of these findings, we propose self-energy functional theory as the omnibus framework for treating bosonic lattice models, in particular, in cases where path integral quantum Monte Carlo methods suffer from severe sign problems (e.g., in the presence of nontrivial gauge fields or frustration). Self-energy functional theory enables the construction of diagrammatically sound approximations that are quantitatively precise and controlled in the number of optimization parameters but nevertheless remain computable by modest means.

Show Abstract

Rotamer libraries for the high-resolution design of beta-amino acid foldamers

A Watkins, D. Renfrew, T Craven, P Arora, R. Bonneau

β-amino acids offer attractive opportunities to develop biologically active peptidomimetics, either employed alone or in conjunction with natural α-amino acids. Owing to their potential for unique conformational preferences that deviate considerably from α-peptide geometries, β-amino acids greatly expand the possible chemistries and physical properties available to polyamide foldamers. Complete in silico support for designing new molecules incorporating nonnatural amino acids typically requires representing their side chain conformations as sets of discrete rotamers for model refinement and sequence optimization. Such rotamer libraries are key components of several state of the art design frameworks. Here we report the development, incorporation in to the Rosetta macromolecular modeling suite, and validation of rotamer libraries for β3-amino acids.

Show Abstract