481 Publications

Multi-scale simulations of MUT-16 scaffold protein phase separation and client recognition

Kumar Gaurav, Virginia Busetto, S. Hanson, et al.

Phase separation of proteins plays a critical role in cellular organization. How phase-separated protein condensates underpin biological function and how condensates achieve specificity remain elusive. We investigated the phase separation of MUT-16, a scaffold protein in Mutator foci, and its role in recruiting the client protein MUT-8, a key component in RNA silencing in Caenorhabditis elegans. We employed a multi-scale approach that combined coarse-grained (residue-level CALVADOS2 and near-atomistic Martini3) and atomistic simulations. Simulations across different resolutions provide a consistent perspective on how MUT-16 condensates recruit MUT-8, enabling the fine-tuning of chemical details and balancing the computational cost. Both coarse-grained models (CALVADOS2 and Martini3) predicted the relative phase-separation propensities of MUT-16’s disordered regions, which we confirmed through in vitro experiments. Simulations also identified key sequence features and residues driving phase separation and revealed differences in residue interaction propensities between CALVADOS2 and Martini3. Furthermore, Martini3 and 350-μs atomistic simulations on Folding@Home of MUT-8’s N-terminal prion-like domain with MUT-16 M8BR cluster highlighted the importance of cation-interactions between Tyr residues of MUT-8 and Arg residues of MUT-16 M8BR. Lys residues were observed to be more prone to interact in Martini3. Atomistic simulations revealed that the guanidinium group of Arg also engages in interactions and hydrogen bonds with the backbone of Tyr, possibly contributing to the greater strength of Arg-Tyr interactions compared to Lys-Tyr, where these additional favorable contacts are absent. In agreement with our simulations, in vitro co-expression pull-down experiments demonstrated a progressive loss of MUT-8 recruitment after the mutation of Arg in MUT-16 M8BR to Lys or Ala, confirming the critical role of Arg in this interaction. These findings advance our understanding of MUT-16 phase separation and subsequent MUT-8 recruitment, key processes in assembling Mutator foci that drive RNA silencing in C. elegans.

Show Abstract

jaxhps: An elliptic PDE solver built with machine learning in mind

O. Melia, D. Fortunato , Jeremy Hoskins, Rebecca Willett

Elliptic partial differential equations (PDEs) can model many physical phenomena, such as electrostatics, acoustics, wave propagation, and diffusion. In scientific machine learning settings, a high-throughput PDE solver may be required to generate a training dataset, run in the inner loop of an iterative algorithm, or interface directly with a deep neural network. To provide value to machine learning users, such a PDE solver must be compatible with standard automatic differentiation frameworks, scale efficiently when run on graphics processing units (GPUs), and maintain high accuracy for a large range of input parameters. We have designed the jaxhps package with these use-cases in mind by implementing a highly efficient and accurate solver for elliptic problems with native hardware acceleration and automatic differentiation support. This is achieved by expressing a highly-efficient solution method for elliptic PDEs in JAX (Bradbury et al., 2018). This software implements algorithms specifically designed for fast GPU execution of a family of elliptic PDE solvers, which are described in full in Melia et al. (2025).

Show Abstract

Automated evaluation of imaginary time strong coupling diagrams by sum-of-exponentials hybridization fitting

Zhen Huang, D. Golez, Hugo U. R. Strand , J. Kaye

We present an efficient separation of variables algorithm for the evaluation of imaginary time Feynman diagrams appearing in the bold pseudo-particle strong coupling expansion of the Anderson impurity model. The algorithm uses a fitting method based on AAA rational approximation and numerical optimization to obtain a sum-of-exponentials expansion of the hybridization function, which is then used to decompose the diagrams. A diagrammatic formulation of the algorithm leads to an automated procedure for diagrams of arbitrary order and topology. We also present methods of stabilizing the self-consistent solution of the pseudo-particle Dyson equation. The result is a low-cost and high-order accurate impurity solver for quantum embedding methods using general multi-orbital hybridization functions at low temperatures, appropriate for low-to-intermediate expansion orders. In addition to other benchmark examples, we use our solver to perform a dynamical mean-field theory study of a minimal model of the strongly correlated compound Ca 2RuO4, describing the anti-ferromagnetic transition and the in- and out-of-plane anisotropy induced by spin-orbit coupling.

Show Abstract

The Inaugural Flatiron Institute Cryo-EM Conformational Heterogeneity Challenge

M. Astore, P. Cossio, S. Hanson, et al.

Despite the rise of single particle cryo-electron microscopy (cryo-EM) as a premier method for resolving macromolecular structures at atomic resolution, methods to address molecular heterogeneity in vitrified samples have yet to reach maturity. With an increasing number of new methods to analyze the multitude of heterogeneous states captured in single particle images, a systematic approach to validation in this field is needed. With this motivation, we issued a challenge to the community to analyze two cryo-EM particle image sets of thyroglobulin that exhibit continuous conformational heterogeneity. The first dataset was experimental and the second was generated with a simulator, allowing control over the distribution of molecular structures and enabled direct comparison between participants’ submissions and the ground truth molecular structures and distributions. Participants were asked to submit 80 volumes representing the heterogeneous ensemble and estimate their respective populations in the image sets provided. Participation of the research community in the challenge was strong, with submissions from nearly all developers of heterogeneity methods, resulting in 41 submissions across both datasets. Submissions qualitatively exceeded expectations, with the molecular motions identified by methods resembling both each other and the ground truth motion. However, quantitatively assessing these similarities was a challenge in and of itself. In the process of assessing the submissions, we developed several validation metrics, most of which require reference to the underlying ground truth volumes. However, we have also explored the use of metrics that do not necessarily reference ground truth. This is particularly apt for experimental datasets where ground truth is inaccessible. These approaches allowed us to assess the similarity and accuracy in volume quality, molecular motions, and conformational distribution of di!erent submissions. These metrics and the e!orts of all participants help chart a path forward for the improvements of heterogeneity methods for cryo-EM and for future challenges to validate these new methods as they continue to be developed by the community.

Show Abstract

Functions on Symmetric Matrices and Point Clouds via Lightweight Invariant Features from Galois Theory

Ben Blum-Smith, T. Huang, Marco Cuturi, S. Villar

In this work, we present a mathematical formulation for machine learning of (1) functions on symmetric matrices that are invariant with respect to the action of permutations by conjugation, and (2) functions on point clouds that are invariant with respect to rotations, reflections, and permutations of the points. To achieve this, we provide a construction of generically separating invariant features using ideas inspired by Galois theory. We construct 𝑂⁡(𝑛2) invariant features derived from generators for the field of rational functions on 𝑛 ×𝑛 symmetric matrices that are invariant for joint permutations of rows and columns. We show that these invariant features can separate all distinct orbits of symmetric matrices, except for a measure zero set; such features can be used to universally approximate invariant functions on almost all weighted graphs. For point clouds in a fixed dimension, we prove that the number of invariant features can be reduced, generically without losing expressivity, to 𝑂⁡(𝑛), where 𝑛 is the number of points. We combine these invariant features with DeepSets to learn functions on symmetric matrices and point clouds with varying sizes. We empirically demonstrate the feasibility of our approach on molecule property regression and point cloud distance prediction.

Show Abstract

CosmoBench: A Multiscale, Multiview, Multitask Cosmology Benchmark for Geometric Deep Learning

T. Huang, R. Stiskalek, Jun-Young Lee, A. E. Bayer, Charles Margossian, Et al.

Cosmological simulations provide a wealth of data in the form of point clouds and directed trees. A crucial goal is to extract insights from this data that shed light on the nature and composition of the Universe. In this paper we introduce CosmoBench, a benchmark dataset curated from state-of-the-art cosmological simulations whose runs required more than 41 million core-hours and generated over two petabytes of data. CosmoBench is the largest dataset of its kind: it contains 34 thousand point clouds from simulations of dark matter halos and galaxies at three different length scales, as well as 25 thousand directed trees that record the formation history of halos on two different time scales. The data in CosmoBench can be used for multiple tasks -- to predict cosmological parameters from point clouds and merger trees, to predict the velocities of individual halos and galaxies from their collective positions, and to reconstruct merger trees on finer time scales from those on coarser time scales. We provide several baselines on these tasks, some based on established approaches from cosmological modeling and others rooted in machine learning. For the latter, we study different approaches -- from simple linear models that are minimally constrained by symmetries to much larger and more computationally-demanding models in deep learning, such as graph neural networks. We find that least-squares fits with a handful of invariant features sometimes outperform deep architectures with many more parameters and far longer training time. Still there remains tremendous potential to improve these baselines by combining machine learning and cosmology to fully exploit the data. CosmoBench sets the stage for bridging cosmology and geometric deep learning at scale. We invite the community to push the frontier of scientific discovery by engaging with this dataset, available at this https URL

Show Abstract

Higher-order continuum models for twisted bilayer graphene

S. Quinn, Tianyu Kong, M. Luskin, Alexander B. Watson

The first-order continuum partial differential equation (PDE) model proposed by Bistritzer and MacDonald [Proc. Natl. Acad. Sci. U. S. A. 108, 12233–12237 (2011)] accurately describes the single-particle electronic properties of twisted bilayer graphene at small twist angles. In this paper, we obtain higher-order corrections to the Bistritzer–MacDonald (BM) model via a systematic multiple-scales expansion. We prove that the solution of the resulting higher-order PDE model accurately approximates the corresponding tight-binding wave function under a natural choice of parameters and given initial conditions that are spectrally localized to the monolayer Dirac points. Numerical simulations of tight-binding and continuum dynamics demonstrate the validity of the higher-order continuum model. Symmetries of the higher-order models are also discussed. This work extends the analysis from Watson et al., J. Math. Phys. 64, 031502 (2023), which rigorously established the validity of the (first-order) BM model.

Show Abstract

Machine-Learning Interatomic Potentials for Long-Range Systems

Yajie Ji , J. Liang, Zhenli Xu

Machine-learning interatomic potentials have emerged as a revolutionary class of force-field models in molecular simulations, delivering quantum-mechanical accuracy at a fraction of the computational cost and enabling the simulation of large-scale systems over extended timescales. However, they often focus on modeling local environments, neglecting crucial long-range interactions. We propose a sum-of-Gaussians neural network (SOG-Net), a lightweight and versatile framework for integrating long-range interactions into a machine-learning force field. The SOG-Net employs a latent-variable learning network that seamlessly bridges short-range and long-range components, coupled with an efficient Fourier convolution layer that incorporates long-range effects. By learning sum-of-Gaussians multipliers across different convolution layers, the SOG-Net adaptively captures diverse long-range decay behaviors while maintaining close-to-linear computational complexity during training and simulation via nonuniform fast Fourier transforms. The method is demonstrated effective for a broad range of long-range systems.

Show Abstract

Neutral Gas Phase Distribution from H I Morphology: Phase Separation with Scattering Spectra and Variational Autoencoders

Minjie Lei , S. E. Clark, R. Morel, Et al.

Unraveling the multiphase structure of the diffuse interstellar medium as traced by neutral hydrogen (H i) is essential to understanding the lifecycle of the Milky Way. However, H i phase separation is a challenging and underconstrained problem. The neutral gas phase distribution is often inferred from the spectral line structure of H i emission. In this work, we develop a data-driven phase-separation method that extracts H i phase structure solely from the spatial morphology of H i emission intensity structures. We combine scattering spectra (SS) statistics with a Gaussian-mixture variational autoencoder model to (1) derive an interpretable statistical model of different H i phases from their multiscale morphological structures, and (2) we use this model to decompose the 2D channel maps of GALFA-H i emission in diffuse high-latitude (|b|>30) regions over narrow velocity channels (Δv=3 km/s) into cold neutral medium (CNM), warm neutral medium (WNM), and noise components. We integrate our CNM map over velocity channels to compare it to an existing map produced by a spectrum-based method. We find that the two maps are highly correlated, but ours recovers more spatially coherent structures at small scales. Our work illustrates and quantifies a clear physical connection between the H i morphology and H i phase structure, and it unlocks a new avenue for improving future phase-separation techniques by making use of both H i spectral and spatial information to decompose H i in 3D position–position–velocity space. These results are consistent with a physical picture where processes that drive H i phase transitions also shape the morphology of H i gas, imprinting a sparse, filamentary CNM that forms out of a diffuse, extended WNM.

Show Abstract

Universal Spectral Tokenization via Self-Supervised Panchromatic Representation Learning

Jeff Shen, F. Lanusse, Liam Holden Parker, L. Sarra, A. Bietti, R. Morel, Et al.

Sequential scientific data span many resolutions and domains, and unifying them into a common representation is a key step toward developing foundation models for the sciences. Astronomical spectra exemplify this challenge: massive surveys have collected millions of spectra across a wide range of wavelengths and resolutions, yet analyses remain fragmented across spectral domains (e.g., optical vs. infrared) and object types (e.g., stars vs. galaxies), limiting the ability to pool information across datasets. We present a deep learning model that jointly learns from heterogeneous spectra in a self-supervised manner. Our universal spectral tokenizer processes spectra from a variety of object types and resolutions directly on their native wavelength grids, producing intrinsically aligned, homogeneous, and physically meaningful representations that can be efficiently adapted to achieve competitive performance across a range of downstream tasks. For the first time, we demonstrate that a single model can unify spectral data across resolutions and domains, suggesting that our model can serve as a powerful building block for foundation models in astronomy—and potentially extend to other scientific domains with heterogeneous sequential data, such as climate and healthcare.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates