2697 Publications

Cryo-electron microscopy ensemble optimization using individual particles and physical constraints

David Silva-Sánchez, E. Thiede, Roy R. Lederman, P. Cossio

Biomolecules are inherently dynamic, and understanding their conformational ensemble distributions is essential for understanding their dynamics and biological roles. Cryo-electron microscopy (cryo-EM), a technique that images individual biomolecules frozen in a thin layer of amorphous ice, has emerged as a leading method for determining the structure of biomolecules at atomic resolution. Recent advances in cryo-EM reconstruction have made significant progress in determining structure in heterogeneous conformational landscapes. In contrast to reconstruction, a different class of techniques has been used to infer population weights, referred to as ensemble reweighting. These methods have yet to be generalized to infer structural heterogeneity simultaneously. Here, we present a method for cryo-EM ensemble optimization that directly infers the optimal set of structures and their associated population weights from cryo-EM images using Bayesian optimization techniques. Our method iterates between optimizing the structures and weights using a likelihood defined in terms of cryo-EM particle images (not reconstructions) and projecting onto the domain of a physical prior through an approach inspired by projected gradient descent. We test the method on several systems, ranging from a four-atom toy model to a large protein system with real cryo-EM data. We find that our approach successfully recovers the structures and their associated weights across a wide range of experimental conditions, even when the number of structures does not match the actual number of metastable states. Our method paves the way for cryo-EM ensemble optimization of flexible biomolecules exhibiting complex, multimodal conformational landscapes.

Show Abstract
December 4, 2025

From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers

Ryotaro Kawata, Yujin Song, A. Bietti, Naoki Nishikawa, Taiji Suzuki, Samuel Vaiter, D. Wu

Transformers can implement both generalizable algorithms (e.g., induction heads) and simple positional shortcuts (e.g., memorizing fixed output positions). In this work, we study how the choice of pretraining data distribution steers a shallow transformer toward one behavior or the other. Focusing on a minimal trigger-output prediction task -- copying the token immediately following a special trigger upon its second occurrence -- we present a rigorous analysis of gradient-based training of a single-layer transformer. In both the infinite and finite sample regimes, we prove a transition in the learned mechanism: if input sequences exhibit sufficient diversity, measured by a low “max-sum” ratio of trigger-to-trigger distances, the trained model implements an induction head and generalizes to unseen contexts; by contrast, when this ratio is large, the model resorts to a positional shortcut and fails to generalize out-of-distribution (OOD). We also reveal a trade-off between the pretraining context length and OOD generalization, and derive the optimal pretraining distribution that minimizes computational cost per sample. Finally, we validate our theoretical predictions with controlled synthetic experiments, demonstrating that broadening context distributions robustly induces induction heads and enables OOD generalization. Our results shed light on the algorithmic biases of pretrained transformers and offer conceptual guidelines for data-driven control of their learned behaviors.

Show Abstract

Emergence of Linear Truth Encodings in Language Models

Shauli Ravfogel, Gilad Yehudai, Tal Linzen, Joan Bruna, A. Bietti

Recent probing studies reveal that large language models exhibit linear subspaces that separate true from false statements, yet the mechanism behind their emergence is unclear. We introduce a transparent, one-layer transformer toy model that reproduces such truth subspaces end-to-end and exposes one concrete route by which they can arise. We study one simple setting in which truth encoding can emerge: a data distribution where factual statements co-occur with other factual statements (and vice-versa), encouraging the model to learn this distinction in order to lower the LM loss on future tokens. We corroborate this pattern with experiments in pretrained language models. Finally, in the toy setting we observe a two-phase learning dynamic: networks first memorize individual factual associations in a few steps, then---over a longer horizon---learn to linearly separate true from false, which in turn lowers language-modeling loss. Together, these results provide both a mechanistic demonstration and an empirical motivation for how and why linear truth representations can emerge in language models.

Show Abstract

Space-time adaptive methods for parabolic evolution equations

We present a family of integral equation-based solvers for the heat equation, reaction-diffusion systems, the unsteady Stokes equation and the incompressible Navier-Stokes equations in two space dimensions. Our emphasis is on the development of methods that can efficiently follow complex solution features in space-time by refinement and coarsening at each time step on an adaptive quadtree. For simplicity, we focus on problems posed in a square domain with periodic boundary conditions. The performance and robustness of the methods are illustrated with several numerical examples.

Show Abstract

Randomized block-Krylov subspace methods for low-rank approximation of matrix functions

D. Persson, Tyler Chen, Christopher Musco

The randomized SVD is a method to compute an inexpensive, yet accurate, low-rank approximation of a matrix. The algorithm assumes access to the matrix through matrix-vector products (matvecs). Therefore, when we would like to apply the randomized SVD to a matrix function, f(A), one needs to approximate matvecs with f(A) using some other algorithm, which is typically treated as a black-box. Chen and Hallman (SIMAX 2023) argued that, in the common setting where matvecs with f(A) are approximated using Krylov subspace methods (KSMs), a more efficient low-rank approximation is possible if we open this black-box. They present an alternative approach that significantly outperforms the naive combination of KSMs with the randomized SVD, although the method lacked theoretical justification. In this work, we take a closer look at the method, and provide strong and intuitive error bounds that justify its excellent performance for low-rank approximation of matrix functions.

Show Abstract

Truncated kernel windowed Fourier projection: a fast algorithm for the 3D free-space wave equation

We present a spectrally accurate fast algorithm for evaluating the solution to the scalar wave equation in free space driven by a large collection of point sources in a bounded domain. With $M$ sources temporally discretized by $N_t$ time steps of size $\Delta t$, a naive potential evaluation at $M$ targets on the same time grid requires $\mathcal O(M^2 N_t)$ work. Our scheme requires $\mathcal{O}\left((M + N^3\log N)N_t\right)$ work, where $N$ scales as $\mathcal O(1/\Delta t)$, i.e., the maximum signal frequency. This is achieved by using the recently-proposed windowed Fourier projection (WFP) method to split the potential into a local part, evaluated directly, plus a smooth history part approximated by an $N^3$-point equispaced discretization of the Fourier transform, where each Fourier coefficient obeys a simple recursion relation. The growing oscillations in the spectral representation (which would be present with a naive use of the Fourier transform) are controlled by spatially truncating the hyperbolic Green's function itself. Thus, the method avoids the need for absorbing boundary conditions. We demonstrate the performance of our algorithm with up to a million sources and targets at 6-digit accuracy. We believe it can serve as a key component in addressing time-domain wave equation scattering problems.

Show Abstract

Physics Steering: Causal Control of Cross-Domain Concepts in a Physics Foundation Model

Rio Alexa Fear, Payel Mukhopadhyay, M. McCabe, A. Bietti, M. Cranmer

Recent advances in mechanistic interpretability have revealed that large language models (LLMs) develop internal representations corresponding not only to concrete entities but also distinct, human-understandable abstract concepts and behaviour. Moreover, these hidden features can be directly manipulated to steer model behaviour. However, it remains an open question whether this phenomenon is unique to models trained on inherently structured data (ie. language, images) or if it is a general property of foundation models. In this work, we investigate the internal representations of a large physics-focused foundation model. Inspired by recent work identifying single directions in activation space for complex behaviours in LLMs, we extract activation vectors from the model during forward passes over simulation datasets for different physical regimes. We then compute "delta" representations between the two regimes. These delta tensors act as concept directions in activation space, encoding specific physical features. By injecting these concept directions back into the model during inference, we can steer its predictions, demonstrating causal control over physical behaviours, such as inducing or removing some particular physical feature from a simulation. These results suggest that scientific foundation models learn generalised representations of physical principles. They do not merely rely on superficial correlations and patterns in the simulations. Our findings open new avenues for understanding and controlling scientific foundation models and has implications for AI-enabled scientific discovery

Show Abstract

The Determinant Ratio Matrix Approach to Solving 3D Matching and 2D Orthographic Projection Alignment Tasks

Andrew J. Hanson, S. Hanson

Pose estimation is a general problem in computer vision with wide applications. The relative orientation of a 3D reference object can be determined from a 3D rotated version of that object, or from a projection of the rotated object to a 2D planar image. This projection can be a perspective projection (the PnP problem) or an orthographic projection (the OnP problem). We restrict our attention here to the OnP problem and the full 3D pose estimation task (the EnP problem). Here we solve the least squares systems for both the error-free EnP and OnP problems in terms of the determinant ratio matrix (DRaM) approach. The noisy-data case can be addressed with a straightforward rotation correction scheme. While the SVD and optimal quaternion eigensystem methods solve the noisy EnP 3D-3D alignment exactly, the noisy 3D-2D orthographic (OnP) task has no known comparable closed form, and can be solved by DRaM-class methods. We note that while previous similar work has been presented in the literature exploiting both the QR decomposition and the Moore-Penrose pseudoinverse transformations, here we place these methods in a larger context that has not previously been fully recognized in the absence of the corresponding DRaM solution. We term this class of solutions as the DRaM family, and conduct comparisons of the behavior of the families of solutions for the EnP and OnP rotation estimation problems. Overall, this work presents both a new solution to the 3D and 2D orthographic pose estimation problems and provides valuable insight into these classes of problems. With hindsight, we are able to show that our DRaM solutions to the exact EnP and OnP problems possess derivations that could have been discovered in the time of Gauss, and in fact generalize to all analogous N-dimensional Euclidean pose estimation problems.

Show Abstract
November 24, 2025

Cellular and Spatial Drivers of Unresolved Injury and Functional Decline in the Human Kidney

Blue B. Lake, X. Chen, R. Sealfon, O. Troyanskaya, et al.

Building upon a foundational Human Kidney resource, we present a comprehensive multi-modal atlas that defines spatially resolved versus unresolved repair states and mechanisms in human kidney disease. Homeostatic interactions between injured kidney epithelium and its surrounding milieu determine successful repair outcomes, while pathogenic signaling promotes unresolved inflammation and fibrosis leading to chronic disease. We integrated multiple single-cell and spatial modalities across ∼700 samples from >350 patients (∼250 research biopsies), analyzing ∼1.7 million cells alongside complementary mouse multi-omic profiles spanning acute-to-chronic injury and aging (>300,000 cells) and spatial transcriptomic analysis of >150 human biopsies. This cross-species atlas delineates functional pathways and druggable targets across the nephron and defines gene regulatory networks and chromatin landscapes governing tubular, fibroblast, and immune cell transitions from injury to either recovery or failed repair states. We identified distinct cellular states associated with specific pathological features that show dynamic distributions between acute kidney injury (AKI) and chronic kidney disease (CKD), organized within unique spatial niches that reveal progression mechanisms from early injury to unresolved disease. Gene regulatory analyses prioritized key transcription factor activities (SOX4, SOX9, NFKB1, REL, KLFs) and their target networks establishing disease states and tissue microenvironments. These regulatory programs were directly linked to clinical outcomes, identifying molecular signatures of recovery and secreted biomarkers predictive of AKI-to-CKD progression, providing a key resource for therapeutic development and precision medicine approaches in kidney disease.

Show Abstract
November 24, 2025

Walrus: A Cross-Domain Foundation Model for Continuum Dynamics

M. McCabe, Payel Mukhopadhyay, Tanya Marwah, B. Régaldo-Saint Blancard, Francois Rozet, Cristiana Diaconu, Lucas Meyer, Kaze W. K. Wong, Hadi Sotoudeh, A. Bietti, Irina Espejo, Tom Hehir, S. Golkar, Tom Hehir, Keiya Hirashima, G. Krawezik, F. Lanusse, R. Morel, R. Ohana, L. Parker, M. Pettee, Jeff Shen, K. Cho, M. Cranmer, S. Ho

Foundation models have transformed machine learning for language and vision, but achieving comparable impact in physical simulation remains a challenge. Data heterogeneity and unstable long-term dynamics inhibit learning from sufficiently diverse dynamics, while varying resolutions and dimensionalities challenge efficient training on modern hardware. Through empirical and theoretical analysis, we incorporate new approaches to mitigate these obstacles, including a harmonic-analysis-based stabilization method, load-balanced distributed 2D and 3D training strategies, and compute-adaptive tokenization. Using these tools, we develop Walrus, a transformer-based foundation model developed primarily for fluid-like continuum dynamics. Walrus is pretrained on nineteen diverse scenarios spanning astrophysics, geoscience, rheology, plasma physics, acoustics, and classical fluids. Experiments show that Walrus outperforms prior foundation models on both short and long term prediction horizons on downstream tasks and across the breadth of pretraining data, while ablation studies confirm the value of our contributions to forecast stability, training throughput, and transfer performance over conventional approaches. Code and weights are released for community use.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates