2573 Publications

CROTON: an automated and variant-aware deep learning framework for predicting CRISPR/Cas9 editing outcomes

Victoria R Li, O. Troyanskaya, Z. Zhang

CRISPR/Cas9 is a revolutionary gene-editing technology that has been widely utilized in biology, biotechnology and medicine. CRISPR/Cas9 editing outcomes depend on local DNA sequences at the target site and are thus predictable. However, existing prediction methods are dependent on both feature and model engineering, which restricts their performance to existing knowledge about CRISPR/Cas9 editing

Show Abstract
July 12, 2021

Inverse-Dirichlet Weighting Enables Reliable Training of Physics Informed Neural Networks

Suryanarayana Maddu, Dominik Sturm, Ivo F. Sbalzarin, C. Müller

We characterize and remedy a failure mode that may arise from multi-scale dynamics with scale imbalances during training of deep neural networks, such as Physics Informed Neural Networks (PINNs). PINNs are popular machine-learning templates that allow for seamless integration of physical equation models with data. Their training amounts to solving an optimization problem over a weighted sum of data-fidelity and equation-fidelity objectives. Conflicts between objectives can arise from scale imbalances, heteroscedasticity in the data, stiffness of the physical equation, or from catastrophic interference during sequential training. We explain the training pathology arising from this and propose a simple yet effective inverse-Dirichlet weighting strategy to alleviate the issue. We compare with Sobolev training of neural networks, providing the baseline of analytically ϵ-optimal training. We demonstrate the effectiveness of inverse-Dirichlet weighting in various applications, including a multi-scale model of active turbulence, where we show orders of magnitude improvement in accuracy and convergence over conventional PINN training. For inverse modeling using sequential training, we find that inverse-Dirichlet weighting protects a PINN against catastrophic forgetting.

Show Abstract
July 2, 2021

Regression models for compositional data: General log-contrast formulations, proximal optimization, and microbiome data applications

Patrik L Combettes, C. Müller

Compositional data sets are ubiquitous in science, including geology, ecology, and microbiology. In microbiome research, compositional data primarily arise from high-throughput sequence-based profiling experiments. These data comprise microbial compositions in their natural habitat and are often paired with covariate measurements that characterize physicochemical habitat properties or the physiology of the host. Inferring parsimonious statistical associations between microbial compositions and habitat- or host-specific covariate data is an important step in exploratory data analysis. A standard statistical model linking compositional covariates to continuous outcomes is the linear log-contrast model. This model describes the response as a linear combination of log-ratios of the original compositions and has been extended to the high-dimensional setting via regularization. In this contribution, we propose a general convex optimization model for linear log-contrast regression which includes many previous proposals as special cases. We introduce a proximal algorithm that solves the resulting constrained optimization problem exactly with rigorous convergence guarantees. We illustrate the versatility of our approach by investigating the performance of several model instances on soil and gut microbiome data analysis tasks.

Show Abstract

A Bayesian approach for extracting free energy profiles from cryo-electron microscopy experiments using a path collective variable

Julian Giraldo-Barreto, Sebastian Ortiz, E. Thiede, Karen Palacio-Rodriguez, B. Carpenter, A. Barnett, P. Cossio

Cryo-electron microscopy (cryo-EM) extracts single-particle density projections of individual biomolecules. Although cryo-EM is widely used for 3D reconstruction, due to its single-particle nature, it has the potential to provide information about the biomolecule's conformational variability and underlying free energy landscape. However, treating cryo-EM as a single-molecule technique is challenging because of the low signal-to-noise ratio (SNR) in the individual particles. In this work, we developed the cryo-BIFE method, cryo-EM Bayesian Inference of Free Energy profiles, that uses a path collective variable to extract free energy profiles and their uncertainties from cryo-EM images. We tested the framework over several synthetic systems, where we controlled the imaging parameters and conditions. We found that for realistic cryo-EM environments and relevant biomolecular systems, it is possible to recover the underlying free energy, with the pose accuracy and SNR as crucial determinants. Then, we used the method to study the conformational transitions of a calcium-activated channel with real cryo-EM particles. Interestingly, we recover the most probable conformation (used to generate a high resolution reconstruction of the calcium-bound state), and we find two additional meta-stable states, one which corresponds to the calcium-unbound conformation. As expected for turnover transitions within the same sample, the activation barriers are of the order of a couple $k_BT$. Extracting free energy profiles from cryo-EM will enable a more complete characterization of the thermodynamic ensemble of biomolecules.

Show Abstract

A fast Chebyshev method for the Bingham closure with application to active nematic suspensions

Scott Weady, D. Stein, M. Shelley

Continuum kinetic theories provide an important tool for the analysis and simulation of particle suspensions. When those particles are anisotropic, the addition of a particle orientation vector to the kinetic description yields a 2d−1 dimensional theory which becomes intractable to simulate, especially in three dimensions or near states where the particles are highly aligned. Coarse-grained theories that track only moments of the particle distribution functions provide a more efficient simulation framework, but require closure assumptions. For the particular case where the particles are apolar, the Bingham closure has been found to agree well with the underlying kinetic theory; yet the closure is non-trivial to compute, requiring the solution of an often nearly-singular nonlinear equation at every spatial discretization point at every timestep. In this paper, we present a robust, accurate, and efficient numerical scheme for evaluating the Bingham closure, with a controllable error/efficiency tradeoff. To demonstrate the utility of the method, we carry out high-resolution simulations of a coarse-grained continuum model for a suspension of active particles in parameter regimes inaccessible to kinetic theories. Analysis of these simulations reveals that inaccurately computing the closure can act to effectively limit spatial resolution in the coarse-grained fields. Pushing these simulations to the high spatial resolutions enabled by our method reveals a coupling between vorticity and topological defects in the suspension director field, as well as signatures of energy transfer between scales in this active fluid model.

Show Abstract
arXiv e-prints
June 28, 2021

Experiences and lessons learned from two virtual, hands-on microbiome bioinformatics workshops

Matthew R Dillon, Evan Bolyen, Anja Adamov, J. Morton, R. Bonneau, et al.

In October of 2020, in response to the Coronavirus Disease 2019 (COVID-19) pandemic, our team hosted our first fully online workshop teaching the QIIME 2 microbiome bioinformatics platform. We had 75 enrolled participants who joined from at least 25 different countries on 6 continents, and we had 22 instructors on 4 continents. In the 5-day workshop, participants worked hands-on with a cloud-based shared compute cluster that we deployed for this course. The event was well received, and participants provided feedback and suggestions in a postworkshop questionnaire. In January of 2021, we followed this workshop with a second fully online workshop, incorporating lessons from the first. Here, we present details on the technology and protocols that we used to run these workshops, focusing on the first workshop and then introducing changes made for the second workshop. We discuss what worked well, what didn’t work well, and what we plan to do differently in future workshops.

Show Abstract

How crosslink numbers shape the large-scale physics of cytoskeletal materials

S. Fürthauer, M. Shelley

Cytoskeletal networks are the main actuators of cellular mechanics, and a foundational example for active matter physics. In cytoskeletal networks, motion is generated on small scales by filaments that push and pull on each other via molecular-scale motors. These local actuations give rise to large scale stresses and motion. To understand how microscopic processes can give rise to self-organized behavior on larger scales it is important to consider what mechanisms mediate long-ranged mechanical interactions in the systems. Two scenarios have been considered in the recent literature. The first are systems which are relatively sparse, in which most of the large scale momentum transfer is mediated by the solvent in which cytoskeletal filaments are suspended. The second, are systems in which filaments are coupled via crosslink molecules throughout. Here, we review the differences and commonalities between the physics of these two regimes. We also survey the literature for the numbers that allow us to place a material within either of these two classes.

Show Abstract
June 24, 2021

A causal view on compositional data

Elisabeth Ailer, Niki Kilbertus, C. Müller

Many scientific datasets are compositional in nature. Important examples include species abundances in ecology, rock compositions in geology, topic compositions in large-scale text corpora, and sequencing count data in molecular biology. Here, we provide a causal view on compositional data in an instrumental variable setting where the composition acts as the cause. Throughout, we pay particular attention to the interpretation of compositional causes from the viewpoint of interventions and crisply articulate potential pitfalls for practitioners. Focusing on modern high-dimensional microbiome sequencing data as a timely illustrative use case, our analysis first reveals that popular one-dimensional information-theoretic summary statistics, such as diversity and richness, may be insufficient for drawing causal conclusions from ecological data. Instead, we advocate for multivariate alternatives using statistical data transformations and regression techniques that take the special structure of the compositional sample space into account. In a comparative analysis on synthetic and semi-synthetic data we show the advantages and limitations of our proposal. We posit that our framework may provide a useful starting point for cause-effect estimation in the context of compositional data.

Show Abstract
June 21, 2021

Microtubule reorganization during female meiosis in C. elegans

Ina Lantzsch, S. Fürthauer

Most female meiotic spindles undergo striking morphological changes while transitioning from metaphase to anaphase. The ultra-structure of meiotic spindles, and how changes to this structure correlate with such dramatic spindle rearrangements remains largely unknown. To address this, we applied light microscopy, large-scale electron tomography and mathematical modeling of female meiotic Caenorhabditis elegans spindles. Combining these approaches, we find that meiotic spindles are dynamic arrays of short microtubules that turn over within seconds. The results show that the metaphase to anaphase transition correlates with an increase in microtubule numbers and a decrease in their average length. Detailed analysis of the tomographic data revealed that the microtubule length changes significantly during the metaphase-to-anaphase transition. This effect is most pronounced for microtubules located within 150 nm of the chromosome surface. To understand the mechanisms that drive this transition, we developed a mathematical model for the microtubule length distribution that considers microtubule growth, catastrophe, and severing. Using Bayesian inference to compare model predictions and data, we find that microtubule turn-over is the major driver of the spindle reorganizations. Our data suggest that in metaphase only a minor fraction of microtubules, those closest to the chromosomes, are severed. The large majority of microtubules, which are not in close contact with chromosomes, do not undergo severing. Instead, their length distribution is fully explained by growth and catastrophe. This suggests that the most prominent drivers of spindle rearrangements are changes in nucleation and catastrophe rate. In addition, we provide evidence that microtubule severing is dependent on katanin.

Show Abstract
June 11, 2021

Metallic Microswimmers Driven up the Wall by Gravity

Q. Brosseau, F. Balboa Usabiaga, E. Lushi, Y. Wu, L. Ristroph, M. D. Ward, M. Shelley, J. Zhang

Experiments on autophoretic bimetallic nanorods propelling within a fuel of hydrogen peroxide show that tail-heavy swimmers preferentially orient upwards and ascend along inclined planes. We show that such gravitaxis is strongly facilitated by interactions with solid boundaries, allowing even ultraheavy microswimmers to climb nearly vertical surfaces. Theory and simulations show that the buoyancy or gravitational torque that tends to align the rods is reinforced by a fore-aft drag asymmetry induced by hydrodynamic interactions with the wall.

Show Abstract
June 11, 2021
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates

privacy consent banner

Privacy preference

We use cookies to provide you with the best online experience. By clicking "Accept All," you help us understand how our site is used and enhance its performance. You can change your choice at any time here. To learn more, please visit our Privacy Policy.