2795 Publications

On the Superlinear Relationship between SGD Noise Covariance and Loss Landscape Curvature

Yikuan Zhang, Ning Yang, Y. Tu

Stochastic Gradient Descent (SGD) introduces anisotropic noise that is correlated with the local curvature of the loss landscape, thereby biasing optimization toward flat minima. Prior work often assumes an equivalence between the Fisher Information Matrix and the Hessian for negative log-likelihood losses, leading to the claim that the SGD noise covariance C is proportional to the Hessian H. We show that this assumption holds only under restrictive conditions that are typically violated in deep neural networks. Using the recently discovered Activity--Weight Duality, we find a more general relationship agnostic to the specific loss formulation, showing that C∝𝔼p[h2p], where hp denotes the per-sample Hessian with H=𝔼p[hp]. As a consequence, C and H commute approximately rather than coincide exactly, and their diagonal elements follow an approximate power-law relation Cii∝Hγii with a theoretically bounded exponent 1≤γ≤2, determined by per-sample Hessian spectra. Experiments across datasets, architectures, and loss functions validate these bounds, providing a unified characterization of the noise-curvature relationship in deep learning.

Show Abstract
February 5, 2026

Collective is different: Information exchange and speed-accuracy trade-offs in self-organized patterning

Ashutosh Tripathi, Jörn Dunkel, D. Skinner

During development, highly ordered structures emerge as cells collectively coordinate with each other. While recent advances have clarified how individual cells process and respond to external signals, understanding collective cellular decision making remains a major challenge. Here, we introduce a minimal, analytically tractable model of cell patterning via local cell-cell communication. Using this framework, we identify a trade-off between the speed and accuracy of collective pattern formation and, by adapting techniques from stochastic chemical kinetics, quantify how information flows between cells during patterning. Our analysis reveals counterintuitive features of collective patterning: globally optimized solutions do not necessarily maximize intercellular information transfer and individual cells may appear suboptimal in isolation. Moreover, the model predicts that instantaneous information shared between cells can be nonmonotonic in time as patterning occurs. An analysis of recent experimental data from lateral inhibition in Drosophila pupal abdomen finds a qualitatively similar effect.

Show Abstract
February 4, 2026

Neural population geometry and optimal coding of tasks with shared latent structure

Albert J. Wakhloo, Will Slatton, S. Chung

Animals can recognize latent structures in their environment and apply this information to efficiently navigate the world. Several works argue that the brain supports these abilities by forming neural representations from which behaviorally relevant variables can be read out across contexts and tasks. However, it is unclear which features of neural activity facilitate downstream readout. Here we analytically determine the geometric properties of neural activity that govern linear readout generalization on a set of tasks sharing a common latent structure. We show that four statistics summarizing the dimensionality, factorization and correlation structures of neural activity determine generalization. Early in learning, optimal neural representations are lower dimensional and exhibit higher correlations between single units and task variables than late in learning. We support these predictions through biological and artificial neural data analysis. Our results tie the linearly decodable information in neural population activity to its geometry.

Show Abstract

Exploring How Workflow Variations in Denaturation-Based Assays Impact Global Protein–Protein Interaction Predictions

Tavis J. Reed, Laura M. Haubold, O. Troyanskaya, et al.

Protein denaturation-based assays, such as thermal proximity coaggregation (TPCA) and ion-based proteome-integrated solubility alteration (I-PISA), are powerful tools for characterizing global protein–protein interaction (PPI) networks. These workflows utilize different denaturation methods to probe PPIs, i.e., thermal- or ion-based. How denaturation differences influence PPI network mapping remained to be better understood. Here, we provide an experimental and computational characterization of the effect of the denaturation-based PPI assay on the observed PPI networks. We establish the value of both soluble and insoluble fractions in PPI prediction, determine the ability to minimize sample amount requirement, and assess different relative quantification methods during virus infection. Generating paired TPCA and I-PISA datasets, we define both overlapping sets of proteins and distinct PPI networks specifically captured by these methods. Assessing protein physical properties and subcellar localizations, we show that size, structural complexity, hydrophobicity, and localization influence PPI detection in a workflow-specific manner. We show that the insoluble fractions expand the detectable PPI landscape, underscoring their value in these workflows. Focusing on selected PPI networks (cytoskeletal and DNA repair), we observe the detection of distinct functional populations. Using influenza A infection as a model for cellular perturbation, we demonstrate that the integration of PPI predictions from soluble and insoluble workflows enhances the ability to build biologically informative and interconnected networks. Examining the effects of reducing starting material for TPCA assays, we find that PPI prediction quality remains robust when using a single well of a 96-well plate, a ∼500× reduction in sample input from usual workflows. Introducing simple workflow modifications, we show that label-free data-independent acquisition (DIA) TPCA yields performance comparable to the traditional tandem mass tag (TMT) data-dependent acquisition (DDA) TPCA workflow. This work provides insights into denaturation-based assays, highlights the value of insoluble fractions, and offers practical improvements for enhancing global PPI network mapping.

Show Abstract

Nonequilibrium Thermodynamics of Biochemical Networks: Energetics of Cellular Functions

I review recent advances in nonequilibrium thermodynamics of biochemical networks, organized around two central questions. First, why is free energy dissipation essential for enabling or enhancing biological function? Second, how do energetic costs constrain functional performance? Using several representative systems—beginning with the classical kinetic proofreading mechanism and extending to more recent examples such as accurate sensory adaptation, ultrasensitive responses, and synchronization of biochemical oscillators—I show that this framework not only provides new insights into the molecular mechanisms underlying these diverse processes but also reveals the general thermodynamic principles that govern their biological functions. I highlight the characteristic signatures of nonequilibrium behavior and the emergence of fundamental energy–performance trade-offs. This review strives to present the framework pedagogically and with sufficient technical detail to enable theory-inclined biophysicists to apply it to their own systems of interest. I conclude by proposing a nonequilibrium thermodynamic law for living systems and outlining promising directions for extending this theoretical approach to an even broader range of biological phenomena.

Show Abstract

Sparse input representations explain odor discrimination in complex, concentration-varying mixtures

Hannah McCalmon, George Cai, Constantine Tsibouris, Farhad Pashakhanloo, S. Chung , Vikrant Kapoor, Venkatesh N Murthy

In natural environments, animals must detect behaviorally relevant odors despite variability in both odor mixture composition and stimulus intensity. Although mice can identify salient odors embedded in complex mixtures, how target concentration and background complexity jointly constrain discrimination remains unclear. We trained mice in a two-alternative forced choice task to identify target odors embedded in mixtures containing up to 16 background components. After performance stabilized, we systematically varied target odor concentration. Discrimination accuracy declined with decreasing target concentration but showed little additional dependence on background complexity. Using a biophysically grounded model of olfactory bulb glomerular responses, we show that linear decoding reproduces behavioral performance when intrinsic neural noise dominates over background-driven variability. Manifold capacity analysis revealed that neural representations remain efficiently structured for odor discrimination despite variation in background complexity. These results define a noise-limited regime of olfactory discrimination in which target detectability is primarily constrained by neural sensitivity rather than background interference.

Show Abstract
January 28, 2026

Understanding the Mechanisms of Fast Hyperparameter Transfer

The growing scale of deep learning models has rendered exhaustive hyperparameter (HP) optimization prohibitively expensive. A promising solution is the use of scale-aware HPs, which can enable direct transfer of optimal settings from small-scale grid searches to large models with minimal performance loss. Such approaches are useful when the optimal settings converge "fast" enough with scale. While approaches like the Maximal Update Parameterization (μP) have empirically displayed fast transfer when scaling model width, a deeper conceptual understanding of the mechanisms that enable this is still missing. Our work establishes a systematic conceptual framework for analyzing fast HP transfer across different synthetic and practical scenarios. In synthetic settings, we present various quantitative examples where transfer either offers a provable computational advantage or fails even under (μP). We then propose a key property that enables the fast transfer often observed in practice: through a novel decomposition of the optimization trajectory, we identify one component that rapidly converges with model width and determines the optimal HPs, and the other that continues to improve the loss with increased width but has negligible impact on HP choice. We conjecture that this decomposition elucidates the key mechanisms behind fast transfer and empirically validate it in practical settings such as LLM training.

Show Abstract

Clocks and Dominoes: Timing Mechanisms of Embryogenesis

Yonghyun Song, Brian D. Leahy, D. Needleman, et al.

How developmental timings are regulated is a fundamental open question. Two widely considered mechanisms are the clock, in which an internal timer determines when each stage occurs, and the domino, in which the completion of each stage triggers the next. It is often unclear how to establish either mechanism. Here, we construct a quantitative framework that uses the correlation structure of developmental timings to test the clock and domino mechanisms. We apply this framework to human pre-implantation development by using ~1 million images of 2946 embryos acquired during IVF treatment, establishing mathematical models of developmental rate. We find that a domino mechanism governs the cleavage timings, while a pronuclei fade-triggered clock mechanism governs the morula and blastocyst timings. These results are consistent with the cell cycle oscillator governing the cleavage timings and the accumulation of embryonic gene products or the degradation of maternally deposited factors governing the morula and blastocyst timings. We next investigate the physiological regulators of developmental timing by analyzing how the timings are statistically associated with the clinical pregnancy outcome. While embryos that result in a clinical pregnancy tend to exhibit shorter cleavage timings, this association is primarily driven by patient-specific properties. In contrast, embryo-specific properties independently influence the pregnancy outcome and the cleavage timings, so that factors directly determining implantation potential, such as aneuploidy, can only weakly impact the cleavage timings. Taken together, this work provides a robust framework for decoding developmental timing mechanisms, with significant implications for fundamental biology and clinical practice.

Show Abstract
January 26, 2026

Neurons as Detectors of Coherent Sets in Sensory Dynamics

We model sensory streams as observations from high-dimensional stochastic dynamical systems and conceptualize sensory neurons as self-supervised learners of compact representations of such dynamics. From prior experience, neurons learn coherent sets-regions of stimulus state space whose trajectories evolve cohesively over finite times-and assign membership indices to new stimuli. Coherent sets are identified via spectral clustering of the stochastic Koopman operator (SKO), where the sign pattern of a subdominant singular function partitions the state space into minimally coupled regions. For multivariate Ornstein-Uhlenbeck processes, this singular function reduces to a linear projection onto the dominant singular vector of the whitened state-transition matrix. Encoding this singular vector as a receptive field enables neurons to compute membership indices via the projection sign in a biologically plausible manner. Each neuron detects either a predictive coherent set (stimuli with common futures) or a retrospective coherent set (stimuli with common pasts), suggesting a functional dichotomy among neurons. Since neurons lack access to explicit dynamical equations, the requisite singular vectors must be estimated directly from data, for example, via past-future canonical correlation analysis on lag-vector representations-an approach that naturally extends to nonlinear dynamics. This framework provides a novel account of neuronal temporal filtering, the ubiquity of rectification in neural responses, and known functional dichotomies. Coherent-set clustering thus emerges as a fundamental computation underlying sensory processing and transferable to bio-inspired artificial systems.

Show Abstract

Multiscale Clustering and Source Separation of InSight Mission Seismic Data

Ali Siahkoohi , R. Morel, R. Balestriero, Erwan Allys, Grégory Sainton, Taichi Kawamura

Unsupervised source separation involves unraveling an unknown set of source signals recorded through a mixing operator, with limited prior knowledge about the sources, and only access to a dataset of signal mixtures. This problem is inherently ill-posed and is further challenged by the variety of timescales exhibited by sources in time series data from planetary space missions. As such, a systematic multiscale unsupervised approach is needed to identify and separate sources at different timescales. Existing methods typically rely on a preselected window size that determines their operating timescale, limiting their capacity to handle multiscale sources. To address this issue, we propose an unsupervised multiscale clustering and source separation framework by leveraging wavelet scattering spectra that provide a low-dimensional representation of stochastic processes, capable of distinguishing between different non-Gaussian stochastic processes. Nested within this representation space, we develop a factorial variational autoencoder (VAE) that is trained to probabilistically cluster sources at different timescales. To perform source separation, we use samples from clusters at multiple timescales obtained via the factorial VAE (fVAE) as prior information, and formulate an optimization problem in the wavelet scattering spectra representation space. When applied to the entire seismic dataset recorded during the NASA Interior Exploration using Seismic Investigations, Geodesy and Heat Transport (InSight) mission on Mars, containing sources varying greatly in timescale, our approach disentangles such different sources, e.g., minute-long transient one-sided pulses (known as “glitches”) and structured ambient noises resulting from atmospheric activities that typically last for tens of minutes, and provide an opportunity to conduct further investigations into the isolated sources.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates