2697 Publications

Tree-aggregated predictive modeling of microbiome data

Jacob Bien, Xiaohan Yan, Léo Simpson, C. Müller

Modern high-throughput sequencing technologies provide low-cost microbiome survey data across all habitats of life at unprecedented scale. At the most granular level, the primary data consist of sparse counts of amplicon sequence variants or operational taxonomic units that are associated with taxonomic and phylogenetic group information. In this contribution, we leverage the hierarchical structure of amplicon data and propose a data-driven and scalable tree-guided aggregation framework to associate microbial subcompositions with response variables of interest. The excess number of zero or low count measurements at the read level forces traditional microbiome data analysis workflows to remove rare sequencing variants or group them by a fixed taxonomic rank, such as genus or phylum, or by phylogenetic similarity. By contrast, our framework, which we call trac (tree-aggregation of compositional data), learns data-adaptive taxon aggregation levels for predictive modeling, greatly reducing the need for user-defined aggregation in preprocessing while simultaneously integrating seamlessly into the compositional data analysis framework. We illustrate the versatility of our framework in the context of large-scale regression problems in human gut, soil, and marine microbial ecosystems. We posit that the inferred aggregation levels provide highly interpretable taxon groupings that can help microbiome researchers gain insights into the structure and functioning of the underlying ecosystem of interest.

Show Abstract
July 15, 2021

On the Origin of Stochastic, Low-Frequency Photometric Variability in Massive Stars

M. Cantiello, D. Lecoanet, A. Jermyn, L. Grassitelli

High-precision photometric observations have revealed ubiquitous stochastic low-frequency photometric variability in early-type stars. It has been suggested that this variability arises due to either subsurface convection or internal gravity waves launched by the convective core. Here we show that relevant properties of convection in subsurface convective layers correlate very well with the timescale and amplitude of stochastic low-frequency photometric variability, as well as with the amplitude of macroturbulence. We suggest that low-frequency, stochastic photometric variability and surface turbulence in massive stars are caused by the presence of subsurface convection. We show that an explanation for the observed surface photometric variability and macroturbulence relying on convective core driven internal gravity waves encounters a number of difficulties and seems unlikely to be able to explain the observed trends.

Show Abstract

Analyzing black-hole ringdowns

A perturbed black hole rings down by emitting gravitational waves in tones with specific frequencies and durations. Such tones encode prized information about the geometry of the source spacetime and the fundamental nature of gravity, making the measurement of black hole ringdowns a key goal of gravitational wave astronomy. However, this task is plagued by technical challenges that invalidate the naive application of standard data analysis methods and complicate sensitivity projections. In this paper, we provide a comprehensive account of the formalism required to properly carry out ringdown analyses, examining in detail the foundations of recent observational results, and providing a framework for future measurements. We build on those insights to clarify the concepts of ringdown detectability and resolvability -- touching on the drawbacks of both Bayes factors and naive Fisher matrix approaches -- and find that overly pessimistic heuristics have led previous works to underestimate the role of ringdown overtones for black hole spectroscopy. We put our framework to work on the analysis of a variety of simulated signals in colored noise, including analytic injections and a numerical relativity simulation consistent with GW150914. We demonstrate that we can use tones of the quadrupolar angular harmonic to test the no-hair theorem at current sensitivity, with precision comparable to published constraints from real data. Finally, we assess the role of modeling systematics, and project measurements for future, louder signals. We release ringdown, a Python library for analyzing black hole ringdowns using the the methods discussed in this paper, under a permissive open-source license at \href{https://github.com/maxisi/ringdown}{this https URL}.

Show Abstract

Coherent Electromagnetic Emission from Relativistic Magnetized Shocks

Lorenzo Sironi, Illya Plotnikov, J. Nättilä, Andrei M. Beloborodov

Relativistic magnetized shocks are a natural source of coherent emission, offering a plausible radiative mechanism for Fast Radio Bursts (FRBs). We present first-principles 3D simulations that provide essential information for the FRB models based on shocks: the emission efficiency, spectrum, and polarization. The simulated shock propagates in an e± plasma with magnetization σ>1. The measured fraction of shock energy converted to coherent radiation is ≃10−3σ−1, and the energy-carrying wavenumber of the wave spectrum is ≃4ωc/c, where ωc is the upstream gyrofrequency. The ratio of the O-mode and X-mode energy fluxes emitted by the shock is ≃0.4σ−1. The dominance of the X-mode at σ≫1 is particularly strong, approaching 100% in the spectral band around 2ωc. We also provide a detailed description of the emission mechanism for both X- and O-modes.

Show Abstract

CROTON: an automated and variant-aware deep learning framework for predicting CRISPR/Cas9 editing outcomes

Victoria R Li, O. Troyanskaya, Z. Zhang

CRISPR/Cas9 is a revolutionary gene-editing technology that has been widely utilized in biology, biotechnology and medicine. CRISPR/Cas9 editing outcomes depend on local DNA sequences at the target site and are thus predictable. However, existing prediction methods are dependent on both feature and model engineering, which restricts their performance to existing knowledge about CRISPR/Cas9 editing

Show Abstract
July 12, 2021

Inverse-Dirichlet Weighting Enables Reliable Training of Physics Informed Neural Networks

Suryanarayana Maddu, Dominik Sturm, Ivo F. Sbalzarin, C. Müller

We characterize and remedy a failure mode that may arise from multi-scale dynamics with scale imbalances during training of deep neural networks, such as Physics Informed Neural Networks (PINNs). PINNs are popular machine-learning templates that allow for seamless integration of physical equation models with data. Their training amounts to solving an optimization problem over a weighted sum of data-fidelity and equation-fidelity objectives. Conflicts between objectives can arise from scale imbalances, heteroscedasticity in the data, stiffness of the physical equation, or from catastrophic interference during sequential training. We explain the training pathology arising from this and propose a simple yet effective inverse-Dirichlet weighting strategy to alleviate the issue. We compare with Sobolev training of neural networks, providing the baseline of analytically ϵ-optimal training. We demonstrate the effectiveness of inverse-Dirichlet weighting in various applications, including a multi-scale model of active turbulence, where we show orders of magnitude improvement in accuracy and convergence over conventional PINN training. For inverse modeling using sequential training, we find that inverse-Dirichlet weighting protects a PINN against catastrophic forgetting.

Show Abstract
July 2, 2021

Regression models for compositional data: General log-contrast formulations, proximal optimization, and microbiome data applications

Patrik L Combettes, C. Müller

Compositional data sets are ubiquitous in science, including geology, ecology, and microbiology. In microbiome research, compositional data primarily arise from high-throughput sequence-based profiling experiments. These data comprise microbial compositions in their natural habitat and are often paired with covariate measurements that characterize physicochemical habitat properties or the physiology of the host. Inferring parsimonious statistical associations between microbial compositions and habitat- or host-specific covariate data is an important step in exploratory data analysis. A standard statistical model linking compositional covariates to continuous outcomes is the linear log-contrast model. This model describes the response as a linear combination of log-ratios of the original compositions and has been extended to the high-dimensional setting via regularization. In this contribution, we propose a general convex optimization model for linear log-contrast regression which includes many previous proposals as special cases. We introduce a proximal algorithm that solves the resulting constrained optimization problem exactly with rigorous convergence guarantees. We illustrate the versatility of our approach by investigating the performance of several model instances on soil and gut microbiome data analysis tasks.

Show Abstract

A Bayesian approach for extracting free energy profiles from cryo-electron microscopy experiments using a path collective variable

Julian Giraldo-Barreto, Sebastian Ortiz, E. Thiede, Karen Palacio-Rodriguez, B. Carpenter, A. Barnett, P. Cossio

Cryo-electron microscopy (cryo-EM) extracts single-particle density projections of individual biomolecules. Although cryo-EM is widely used for 3D reconstruction, due to its single-particle nature, it has the potential to provide information about the biomolecule's conformational variability and underlying free energy landscape. However, treating cryo-EM as a single-molecule technique is challenging because of the low signal-to-noise ratio (SNR) in the individual particles. In this work, we developed the cryo-BIFE method, cryo-EM Bayesian Inference of Free Energy profiles, that uses a path collective variable to extract free energy profiles and their uncertainties from cryo-EM images. We tested the framework over several synthetic systems, where we controlled the imaging parameters and conditions. We found that for realistic cryo-EM environments and relevant biomolecular systems, it is possible to recover the underlying free energy, with the pose accuracy and SNR as crucial determinants. Then, we used the method to study the conformational transitions of a calcium-activated channel with real cryo-EM particles. Interestingly, we recover the most probable conformation (used to generate a high resolution reconstruction of the calcium-bound state), and we find two additional meta-stable states, one which corresponds to the calcium-unbound conformation. As expected for turnover transitions within the same sample, the activation barriers are of the order of a couple $k_BT$. Extracting free energy profiles from cryo-EM will enable a more complete characterization of the thermodynamic ensemble of biomolecules.

Show Abstract

A fast Chebyshev method for the Bingham closure with application to active nematic suspensions

Scott Weady, D. Stein, M. Shelley

Continuum kinetic theories provide an important tool for the analysis and simulation of particle suspensions. When those particles are anisotropic, the addition of a particle orientation vector to the kinetic description yields a 2d−1 dimensional theory which becomes intractable to simulate, especially in three dimensions or near states where the particles are highly aligned. Coarse-grained theories that track only moments of the particle distribution functions provide a more efficient simulation framework, but require closure assumptions. For the particular case where the particles are apolar, the Bingham closure has been found to agree well with the underlying kinetic theory; yet the closure is non-trivial to compute, requiring the solution of an often nearly-singular nonlinear equation at every spatial discretization point at every timestep. In this paper, we present a robust, accurate, and efficient numerical scheme for evaluating the Bingham closure, with a controllable error/efficiency tradeoff. To demonstrate the utility of the method, we carry out high-resolution simulations of a coarse-grained continuum model for a suspension of active particles in parameter regimes inaccessible to kinetic theories. Analysis of these simulations reveals that inaccurately computing the closure can act to effectively limit spatial resolution in the coarse-grained fields. Pushing these simulations to the high spatial resolutions enabled by our method reveals a coupling between vorticity and topological defects in the suspension director field, as well as signatures of energy transfer between scales in this active fluid model.

Show Abstract
arXiv e-prints
June 28, 2021

Experiences and lessons learned from two virtual, hands-on microbiome bioinformatics workshops

Matthew R Dillon, Evan Bolyen, Anja Adamov, J. Morton, R. Bonneau, et al.

In October of 2020, in response to the Coronavirus Disease 2019 (COVID-19) pandemic, our team hosted our first fully online workshop teaching the QIIME 2 microbiome bioinformatics platform. We had 75 enrolled participants who joined from at least 25 different countries on 6 continents, and we had 22 instructors on 4 continents. In the 5-day workshop, participants worked hands-on with a cloud-based shared compute cluster that we deployed for this course. The event was well received, and participants provided feedback and suggestions in a postworkshop questionnaire. In January of 2021, we followed this workshop with a second fully online workshop, incorporating lessons from the first. Here, we present details on the technology and protocols that we used to run these workshops, focusing on the first workshop and then introducing changes made for the second workshop. We discuss what worked well, what didn’t work well, and what we plan to do differently in future workshops.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates