2795 Publications

Transient learning dynamics drive escape from sharp valleys in Stochastic Gradient Descent

Ning Yang, Yikuan Zhang, Qi Ouyang, Chao Tang, Y. Tu

Stochastic gradient descent (SGD) is central to deep learning, yet the dynamical origin of its preference for flatter, more generalizable solutions remains unclear. Here, by analyzing SGD learning dynamics, we identify a nonequilibrium mechanism governing solution selection. Numerical experiments reveal a transient exploratory phase in which SGD trajectories repeatedly escape sharp valleys and transition toward flatter regions of the loss landscape. By using a tractable physical model, we show that the SGD noise reshapes the landscape into an effective potential that favors flat solutions. Crucially, we uncover a transient freezing mechanism: as training proceeds, growing energy barriers suppress inter-valley transitions and ultimately trap the dynamics within a single basin. Increasing the SGD noise strength delays this freezing, which enhances convergence to flatter minima. Together, these results provide a unified physical framework linking learning dynamics, loss-landscape geometry, and generalization, and suggest principles for the design of more effective optimization algorithms.

Show Abstract
January 16, 2026

Learning normalized image densities via dual score matching

F. Guth, Z. Kadkhodaie, E. P. Simoncelli

Learning probability models from data is at the heart of many machine learning endeavors, but is notoriously difficult due to the curse of dimensionality. We introduce a new framework for learning normalized energy (log probability) models that is inspired from diffusion generative models, which rely on networks optimized to estimate the score. We modify a score network architecture to compute an energy while preserving its inductive biases. The gradient of this energy network with respect to its input image is the score of the learned density, which can be optimized using a denoising objective. Importantly, the gradient with respect to the noise level provides an additional score that can be optimized with a novel secondary objective, ensuring consistent and normalized energies across noise levels. We train an energy network with this dual score matching objective on the ImageNet64 dataset, and obtain a cross-entropy (negative log likelihood) value comparable to the state of the art. We further validate our approach by showing that our energy model strongly generalizes: estimated log probabilities are nearly independent of the specific images in the training set. Finally, we demonstrate that both image probability and dimensionality of local neighborhoods vary significantly with image content, in contrast with traditional assumptions such as concentration of measure or support on a low-dimensional manifold.

Show Abstract

An O(logN) Monte Carlo method for periodic Coulomb systems

Efficient Monte Carlo (MC) sampling of many-body systems with long-range electrostatics is often limited by the cost of per-move energy-difference evaluation under periodic boundary conditions. We present DMK-MC, an accelerated MC method that adapts the dual-space multilevel kernel-splitting (DMK) framework to single-particle Metropolis updates. DMK-MC computes the energy change and, upon acceptance, updates the stored incoming plane-wave fields with O(1) work per tree level, yielding an overall O(logN) expected work per trial move for fixed accuracy. The method decomposes the Coulomb kernel into three components: a global, periodized smooth part; a multilevel sequence of smooth difference kernels whose interactions are restricted to same-level colleague boxes; and a singular residual kernel whose short-range interactions are evaluated directly. Benchmarks on uniform, highly nonuniform, and implicit-solvent electrolyte and colloidal configurations show that DMK-MC consistently outperforms a recent FMM-based O(logN) Monte Carlo method, delivering several-fold speedups at comparable tolerances.

Show Abstract

Mechanical origin for non-equilibrium ultrasensitivity in the bacterial flagellar motor

Flagellar motors enable bacteria to navigate their environments by switching rotation direction in response to external cues with high sensitivity. Previous work indicated that the ultrasensitivity of the flagellar motor originates from conformational spread, in which subunits of the switching complex are strongly coupled to their neighbours as in an equilibrium Ising model. However, dynamic single-motor measurements indicated that rotation switching is driven out of equilibrium, and the mechanism for this dissipative driving remains unknown. Here we propose that local mechanical torques on motor subunits can affect their conformation dynamics, based on recent structures observed with cryo-electron microscopy. This gives rise to a tug of war between stator-associated subunits that produces cooperative, non-equilibrium switching responses without requiring nearest-neighbour interactions. Our model predicts that the motor response cooperativity grows with the number of stators driving rotation, which is consistent with published experimental results. Finally, we show that operating out of equilibrium enables motors to achieve high cooperativity with faster responses compared with equilibrium motors. Our results indicate a general role for mechanics in sensitive chemical regulation.

Show Abstract

An O(N) quasi-Ewald splitting method for nanoconfined electrostatics

Zecheng Gan , X. Gao, Yuqing Li

Simulating the dynamics of charged particles in quasi-two-dimensional (quasi-2D) nanoconfined systems presents a significant computational challenge due to the long-range nature of electrostatic interactions and the geometric anisotropy. To address this, we introduce a novel quasi-Ewald splitting strategy tailored for particle-based simulations in such geometry. Our splitting strategy seamlessly integrates a collection of advanced numerical techniques, including optimal quadrature rules [L. N. Trefethen, SIAM Rev. 64(1)(2022), pp.132-150], fast pairwise kernel summation methods [S. Jiang and L. Greengard, Commun. Comput. Phys. 31(1)(2022), pp.1-26], and the random batch method with importance sampling in k-space [S. Jin, L. Li, Z. Xu et al., SIAM J. Sci. Comput. 43(4)(2021), pp.B937-B960]. The resulting algorithm achieves an O(N) overall computational complexity, where N denotes the total number of confined particles. Simulations of several prototype systems validate the accuracy and efficiency of our method. Furthermore, we present numerical observations specifically related to nanoconfined charged many-body systems, highlighting phenomena such as dielectric boundary effects, anisotropic diffusion, and the structure of the electrical double layer (EDL) under conditions of charge asymmetry.

Show Abstract

Scalable inference of functional neural connectivity at submillisecond timescales

A. Medvedeva, E. Balzani, A. Williams, Stephen L Keeley

The Poisson Generalized Linear Model (GLM) is a foundational tool for analyzing neural spike train data. However, standard implementations rely on discretizing spike times into binned count data, limiting temporal resolution and scalability. Here, we develop Monte Carlo (MC) methods and polynomial approximations (PA) to the continuous-time analog of these models, and show them to be advantageous over their discrete-time counterparts. Further, we propose using a set of exponentially scaled Laguerre polynomials as an orthogonal temporal basis, which improves filter identification and yields closed-form integral solutions under the polynomial approximation. Applied to both synthetic and real spike-time data from rodent hippocampus, our methods demonstrate superior accuracy and scalability compared to traditional binned GLMs, enabling functional connectivity inference in large-scale neural recordings that are temporally precise on the order of synaptic dynamical timescales and in agreement with known anatomical properties of hippocampal subregions. We provide open-source implementations of both MC and PA estimators, optimized for GPU acceleration, to facilitate adoption in the neuroscience community.

Show Abstract

Automated Machine Learning Pipeline: Large Language Models-Assisted Automated Data set Generation for Training Machine-Learned Interatomic Potentials

Adam Lahouari, J. Rogal, Mark E. Tuckerman

Machine learning interatomic potentials (MLIPs) have become powerful tools to extend molecular simulations beyond the limits of quantum methods, offering near-quantum accuracy at much lower computational cost. Yet, developing reliable MLIPs remains difficult because it requires generating high-quality datasets, preprocessing atomic structures, and carefully training and validating models. In this work, we introduce an Automated Machine Learning Pipeline (AMLP) that unifies the entire workflow from dataset creation to model validation. AMLP employs large-language-model agents to assist with electronic-structure code selection, input preparation, and output conversion, while its analysis suite (AMLP-Analysis), based on ASE supports a range of molecular simulations. The pipeline is built on the MACE architecture and validated on acridine polymorphs, where, with a straightforward fine-tuning of a foundation model, mean absolute errors of 1.7 meV/atom in energies and 7.0 meV/Å in forces are achieved. The fitted MLIP reproduces DFT geometries with sub-Å accuracy and demonstrates stability during molecular dynamics simulations in the microcanonical and canonical ensembles.

Show Abstract

Size-Consistent Adiabatic Connection Functionals via Orbital-Based Matrix Interpolation

We introduce a size-consistent and orbital-invariant formalism for constructing correlation functionals based on the adiabatic connection for density functional theory (DFT). By constructing correlation energy matrices for the weak and strong correlation limits in the space of occupied orbitals, our method, which we call orbital-based size-consistent matrix interpolation (OSMI), avoids previous difficulties in the construction of size-consistent adiabatic connection functionals. We design a simple, nonempirical adiabatic connection and a one-parameter strong-interaction limit functional, and we show that the resulting method reproduces the correlation energy of the uniform electron gas over a wide range of densities. When applied to subsets of the GMTKN55 thermochemistry database, OSMI is more accurate on average than MP2 and nonempirical density functionals. Most notably, OSMI provides excellent predictions of the barrier heights we tested, with average errors of less than 2 kcal mol

Show Abstract

Fast Ewald Summation with Prolates for Charged Systems in the NPT Ensemble

We present an NPT extension of Ewald summation with prolates (ESP), a spectrally accurate and scalable particle-mesh method for molecular dynamics simulations of periodic, charged systems. Building on the recently introduced ESP framework, this work focuses on rigorous and thermodynamically consistent pressure/stress evaluation in the isothermal--isobaric ensemble. ESP employs prolate spheroidal wave functions as both splitting and spreading kernels, reducing the Fourier grid size needed to reach a prescribed pressure accuracy compared with current widely used mesh-Ewald methods based on Gaussian splitting and B-spline spreading. We derive a unified pressure-tensor formulation applicable to isotropic, semi-isotropic, anisotropic, and fully flexible cells, and show that the long-range pressure can be evaluated with a single forward FFT followed by diagonal scaling, whereas force evaluation requires both forward and inverse transforms. We provide production implementations in LAMMPS and GROMACS and validate pressure and force accuracy on bulk water, LiTFSI ionic liquids, and a transmembrane system. Benchmarks on up to 3×103 CPU cores demonstrate strong scaling and reduced communication cost at matched accuracy, particularly for NPT pressure evaluation.

Show Abstract

An Evidence-Grounded Research Assistant for Functional Genomics and Drug Target Assessment

Ksenia Sokolova, O. Troyanskaya, et al.

The growing availability of biological data resources has transformed research, yet their effective use remains challenging: selecting appropriate sources requires domain knowledge, data are fragmented across databases, and synthesizing results into reliable conclusions is labor-intensive. Although large language models promise to address these barriers, their impact in biomedicine has been limited by unsupported statements, incorrect claims, and lack of provenance. We introduce Alvessa, an evidence-grounded agentic research assistant designed around verifiability. Alvessa integrates entity recognition, orchestration of pre-validated biological tools, and data-constrained answer generation with statement-level verification against retrieved records, explicitly flagging unsupported claims and guiding revision when reliability criteria are not met. We evaluate Alvessa on dbQA from LAB-Bench and GenomeArena, a benchmark of 720 questions spanning gene and variant annotation, pathways, molecular interactions, miRNA targets, drug-target evidence, protein structure, and gene-phenotype associations. Alvessa substantially improves accuracy relative to general-purpose language models and performs comparably to coding-centric agents while producing fully traceable outputs. Using adversarial perturbations, we show that detection of fabricated statements depends critically on access to retrieved evidence. We further demonstrate application to drug discovery, where evidence-grounded synthesis enables identification of candidate targets missed or misattributed by literature-centered reasoning alone. Alvessa and GenomeArena are released to the community to support reproducible, verifiable AI-assisted biological research.

Show Abstract
December 31, 2025
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates