475 Publications

cryoJAX: A Cryo-electron Microscopy Image Simulation Library In JAX

Michael J. O'Brien, S. Hanson, D. Needleman, et al.

While cryo-electron microscopy (cryo-EM) has come to prominence in the last decade due to its ability to resolve biomolecular complexes at atomic resolution, advancements in experimental and computational methods have made cryo-EM promising for investigating intracellular organization and heterogeneous molecular states. A primary challenge for these alternative applications is the development of techniques for cryo-EM data analysis, which are very computationally demanding. To this end, it is advantageous to leverage advanced scientific computing frameworks for statistical analysis. One such framework is JAX, an emerging array-oriented Python numerical computing package for automatic differentiation and vectorization with a growing ecosystem for statistical inference and machine learning. We have developed cryoJAX, a cryo-EM image-simulation library for building computational data-analysis applications in JAX. CryoJAX is a flexible modeling language for cryo-EM image formation and therefore can support a wide range of data analysis downstream. By integrating with the JAX ecosystem, cryoJAX enables the development and deployment of algorithms for the growing breadth of scientific applications for cryo-EM.

Show Abstract

A Lightweight, Geometrically Flexible Fast Algorithm for the Evaluation of Layer and Volume Potentials

F. Fryklund, L. Greengard, S. Jiang, Samuel Potter

Over the last two decades, several fast, robust, and high-order accurate methods have been developed for solving the Poisson equation in complicated geometry using potential theory. In this approach, rather than discretizing the partial differential equation itself, one first evaluates a volume integral to account for the source distribution within the domain, followed by solving a boundary integral equation to impose the specified boundary conditions. Here, we present a new fast algorithm which is easy to implement and compatible with virtually any discretization technique, including unstructured domain triangulations, such as those used in standard finite element or finite volume methods. Our approach combines earlier work on potential theory for the heat equation, asymptotic analysis, the nonuniform fast Fourier transform (NUFFT), and the dual-space multilevel kernel-splitting (DMK) framework. It is insensitive to flaws in the triangulation, permitting not just nonconforming elements, but arbitrary aspect ratio triangles, gaps and various other degeneracies. On a single CPU core, the scheme computes the solution at a rate comparable to that of the fast Fourier transform (FFT) in work per gridpoint.

Show Abstract

Low-temperature transport in high-conductivity correlated metals: a density-functional plus dynamical mean-field study of cubic perovskites

H. LaBollita , Jeremy Lee-Hand, Fabian B. Kugler, Lorenzo Van Muñoz, S. Beck, A. Hampel, J. Kaye, A. Georges, Cyrus E. Dreyer

While methods based on density-functional perturbation theory have dramatically improved our understanding of electron-phonon contributions to transport in materials, methods for accurately capturing electron-electron scattering relevant to low temperatures have seen significantly less development. The case of high-conductivity, moderately correlated materials characterized by low scattering rates is particularly challenging, since exquisite numerical precision of the low-energy electronic structure is required. Recent methodological advancements to density-functional theory combined with dynamical mean-field theory (DFT+DMFT), including adaptive Brillouin-zone integration and numerically precise self-energies, enable a rigorous investigation of electron-electron scattering in such materials. In particular, these tools may be leveraged to perform a robust scattering-rate analysis on both real- and imaginary-frequency axes. Applying this methodology to a subset of ABO$_3$ perovskite oxides -- SrVO$_3$, SrMoO$_3$, PbMoO$_3$, and SrRuO$_3$ -- we demonstrate its ability to qualitatively and quantitatively describe electron-electron contributions to the temperature-dependent direct-current resistivity. This combination of numerical techniques offers fundamental insight into the role of electronic correlations in transport phenomena and provides a predictive tool for identifying materials with potential for technological applications.

Show Abstract

Understanding the Mechanisms of Fast Hyperparameter Transfer

The growing scale of deep learning models has rendered exhaustive hyperparameter (HP) optimization prohibitively expensive. A promising solution is the use of scale-aware HPs, which can enable direct transfer of optimal settings from small-scale grid searches to large models with minimal performance loss. Such approaches are useful when the optimal settings converge "fast" enough with scale. While approaches like the Maximal Update Parameterization (μP) have empirically displayed fast transfer when scaling model width, a deeper conceptual understanding of the mechanisms that enable this is still missing. Our work establishes a systematic conceptual framework for analyzing fast HP transfer across different synthetic and practical scenarios. In synthetic settings, we present various quantitative examples where transfer either offers a provable computational advantage or fails even under (μP). We then propose a key property that enables the fast transfer often observed in practice: through a novel decomposition of the optimization trajectory, we identify one component that rapidly converges with model width and determines the optimal HPs, and the other that continues to improve the loss with increased width but has negligible impact on HP choice. We conjecture that this decomposition elucidates the key mechanisms behind fast transfer and empirically validate it in practical settings such as LLM training.

Show Abstract

An O(logN) Monte Carlo method for periodic Coulomb systems

Efficient Monte Carlo (MC) sampling of many-body systems with long-range electrostatics is often limited by the cost of per-move energy-difference evaluation under periodic boundary conditions. We present DMK-MC, an accelerated MC method that adapts the dual-space multilevel kernel-splitting (DMK) framework to single-particle Metropolis updates. DMK-MC computes the energy change and, upon acceptance, updates the stored incoming plane-wave fields with O(1) work per tree level, yielding an overall O(logN) expected work per trial move for fixed accuracy. The method decomposes the Coulomb kernel into three components: a global, periodized smooth part; a multilevel sequence of smooth difference kernels whose interactions are restricted to same-level colleague boxes; and a singular residual kernel whose short-range interactions are evaluated directly. Benchmarks on uniform, highly nonuniform, and implicit-solvent electrolyte and colloidal configurations show that DMK-MC consistently outperforms a recent FMM-based O(logN) Monte Carlo method, delivering several-fold speedups at comparable tolerances.

Show Abstract

Fast Ewald Summation with Prolates for Charged Systems in the NPT Ensemble

We present an NPT extension of Ewald summation with prolates (ESP), a spectrally accurate and scalable particle-mesh method for molecular dynamics simulations of periodic, charged systems. Building on the recently introduced ESP framework, this work focuses on rigorous and thermodynamically consistent pressure/stress evaluation in the isothermal--isobaric ensemble. ESP employs prolate spheroidal wave functions as both splitting and spreading kernels, reducing the Fourier grid size needed to reach a prescribed pressure accuracy compared with current widely used mesh-Ewald methods based on Gaussian splitting and B-spline spreading. We derive a unified pressure-tensor formulation applicable to isotropic, semi-isotropic, anisotropic, and fully flexible cells, and show that the long-range pressure can be evaluated with a single forward FFT followed by diagonal scaling, whereas force evaluation requires both forward and inverse transforms. We provide production implementations in LAMMPS and GROMACS and validate pressure and force accuracy on bulk water, LiTFSI ionic liquids, and a transmembrane system. Benchmarks on up to 3×103 CPU cores demonstrate strong scaling and reduced communication cost at matched accuracy, particularly for NPT pressure evaluation.

Show Abstract

Comparing cryo-EM methods and molecular dynamics simulation to investigate heterogeneity in ligand-bound TRPV1

M. Astore, David Silva-Sánchez, R. Blackwell, P. Cossio, S. Hanson

Cryogenic electron microscopy (cryo-EM) has emerged as a powerful method for resolving the structure of biological macromolecules. Recently, several computational methods have been developed to study the heterogeneity of molecules in single-particle cryo-EM. In this study, we analyze a publicly available dataset of TRPV1 using five such methods: 3DFlex, 3DVA, cryoDRGN, ManifoldEM, and Bayesian ensemble reweighting. We find significant heterogeneity, but each method produces different results, with some detecting only compositional or conformational heterogeneity. To compare these diverse results, we develop AnaVox to quantitatively determine agreement between heterogeneity methods. Furthermore, applying Bayesian ensemble reweighting combined with molecular dynamics simulations supports the presence of these rarer states within the sample. This study shows that although current methods reveal the presence of heterogeneity, their stochasticity and potential bias present challenges for their routine use. However, with future development, these tools will enable the use of cryo-EM data for quantitative biophysical investigations.

Show Abstract

Improving Cryo-EM Optimization Robustness with an Optimal Transport Loss Function for Noisy Images

Geoffrey Woollard , David Herreros, P. Cossio, et al.

Many tasks in single-particle cryo-electron microscopy (cryo-EM), such as 2D/3D classification and homo/heterogeneous reconstruction, require optimizing model parameters to minimize the discrepancy between observed data and a forward model. The standard Mean Squared Error (MSE) loss function is computationally efficient but suffers from a non-convex rugged loss landscape, particularly for high-resolution heterogeneity inference. In this work, we investigate the practical utility of Sliced Wasserstein (SW) distances. We implement exact W2 estimators (inverse-CDF and greedy matching) of projections alongside a computationally efficient proxy based on the L2 norm of CDFs, a formulation akin to the sliced Cramér–von Mises distance. We establish the latter as a robust, fully differentiable workhorse for the cryo-EM forward model. We evaluate its performance against the MSE in joint inference tasks recovering pose, CTF parameters, and conformational heterogeneity. Our results demonstrate that SW significantly broadens the basin of attraction, enabling robust gradient-based optimization from distant initializations where MSE fails. Using a helical spiral toy model, we highlight how SW losses are sensitive to per-particle contrast, where background noise level miscalibration can induce geometric bias in the inferred structure. We show that this bias is manageable through a joint optimization strategy that treats background contrast as a learnable parameter. Finally, we validate the approach on a synthetic dataset using the Zernike3D framework, showing that the SW loss works and yields an accurate landscape representations, comparable with MSE. These findings establish SW as a powerful tool for navigating the rugged landscapes of cryo-EM forward model parameters

Show Abstract
December 27, 2025

Stabilizing the singularity swap quadrature for near-singular line integrals

David Krantz, A. Barnett, Anna-Karin Tornberg

Singularity swap quadrature (SSQ) is an effective method for the evaluation at nearby targets of potentials due to densities on curves in three dimensions. While highly accurate in most settings, it is known to suffer from catastrophic cancellation when the kernel exhibits both near-vanishing numerators and strong singularities, as arises with scalar double layer potentials or tensorial kernels in Stokes flow or linear elasticity. This precision loss turns out to be tied to the interpolation basis, namely monomial (for open curves) or Fourier (for closed curves). We introduce a simple yet powerful remedy: target-specific translated monomial and Fourier bases that explicitly incorporate the near-vanishing behavior of the kernel numerator. We combine this with a stable evaluation of the constant term which now dominates the integral, significantly reducing cancellation. We show that our approach achieves close to machine precision for prototype integrals, and up to ten orders of magnitude lower error than standard SSQ at extremely close evaluation distances, without significant additional computational cost.

Show Abstract

Facilitating analysis of open neurophysiology data on the DANDI Archive using large language model tools

The DANDI Archive is a key resource for sharing open neurophysiology data, hosting over 400 datasets in the Neurodata Without Borders (NWB) format. While these datasets hold tremendous potential for reanalysis and discovery, many researchers face barriers to reuse, including unfamiliarity with access methods and difficulty identifying relevant content. Here we introduce an AI-powered, agentic chat assistant and a notebook generation pipeline. The chat assistant serves as an interactive tool for exploring DANDI datasets. It leverages large language models (LLMs) and integrates with agentic tools to guide users through data access, visualization, and preliminary analysis. The notebook generator analyzes dataset structure with minimal human input, executing inspection scripts and generating visualizations. It then produces an instructional Python notebook tailored to the dataset. We applied this system to 12 recent datasets. Review by neurophysiology data specialists found the generated notebooks to be generally accurate and well-structured, with most notebooks rated as “very helpful.” This work demonstrates how AI can support FAIR principles by leveraging data standards and lowering barriers to data reuse and engagement.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates