Publications

Task-Induced Representational Invariances Depend on Learning Objective in Deep RL

Manu Srinath Halvagal, S. Lee, S. Chung

Reinforcement Learning (RL) has long served as a model for goal-directed animal behavior in neuroscience. Modern deep RL has shown remarkable success across many domains, further strengthening this connection. The ability to learn abstract representations of high-dimensional state spaces underlies much of this success. However, theoretical understanding of these learned representations remains limited, hindering direct comparisons between models and animal learning. We address this gap by analyzing deep RL representations through the lens of MDP reduction theory. Investigating canonical RL algorithms in a navigation task, we find that even when performance is comparable, the value-based method (DQN) learns representations that are invariant to MDP homomorphism symmetries, while the policy-gradient method (PPO) learns representations invariant to action symmetries. These differences emerge consistently across domains, have downstream consequences for transfer learning, and appear in LLMs in a prompt-dependent manner. Our findings provide a principled approach to comparing learned representations across RL algorithms, with demonstrated practical implications and possible insights for neural coding in the brain.

Show Abstract

Two Speeds of Learning: A Representation-Readout Decomposition of Grokking and Double Descent

C. Chou , Oscar Uzdelewicz, Neng-Chun Chiu, Yao-Yuan Yang, S. Chung

Training loss and accuracy are the standard signals used to monitor generalization during deep neural network training. Two well-documented phenomena complicate this picture: in grokking, train loss falls rapidly while test performance improves abruptly only after a long delay; in epoch-wise double descent, train loss decreases monotonically while test loss or error rises and falls. Existing accounts are often task-specific, and a task-agnostic analysis framework for diagnosing and explaining these phenomena across realistic tasks and architectures is missing. We address this challenge by analyzing two competing processes that underlie learning dynamics: representation learning in the encoder and readout calibration in the final classifier. Using tools from representational geometry, neural tangent kernels, and linear probing, we show that both processes are active throughout training, with the fluctuations of their relative speed giving rise to seemingly anomalous generalization dynamics. Applying the representation-readout decomposition to grokking across a wide range of tasks and architectures, we find that the readout is train-biased before grokking onset, and representation learning is gradual but not absent, contrary to the lazy-to-rich account. The framework further provides diagnostic signatures distinguishing spurious from genuine generalization: in a previously reported MNIST grokking example and an epoch-wise double descent example, apparent delayed or non-monotone generalization is shown to arise from representation degradation and readout misalignment induced by non-standard training recipes. Together, these results establish the representation-readout decomposition as a top-down framework for understanding learning dynamics and revealing underlying algorithms for interpretability research.

Show Abstract

Learning Normalized Energy Models for Linear Inverse Problems

Nicolas Zilberstein, Santiago Segarra, E. P. Simoncelli, F. Guth

Generative diffusion models can provide powerful prior probability models for inverse problems in imaging, but existing implementations suffer from two key limitations: (i) the prior density is represented implicitly, and (ii) they rely on likelihood approximations that introduce sampling biases. We address these challenges by introducing a new energy-based model trained for denoising with a covariance-based regularization term that enforces consistency across different measurement conditions. The trained model can compute normalized posterior densities for diverse linear inverse problems, without additional retraining or fine tuning. In addition to preserving the sampling capabilities of diffusion models, this enables previously unavailable capabilities: energy-guided adaptive sampling that adjusts schedules on-the-fly, unbiased Metropolis-Hastings correction steps, and blind estimation of the degradation operator via Bayes rule. We validate the method on multiple datasets (ImageNet, CelebA, AFHQ) and tasks (inpainting, deblurring), demonstrating competitive or superior performance to established baselines.

Show Abstract

How Data Augmentation Shapes Neural Representations

Tianxiao He, A. Williams, S. Harvey

Data augmentation is widely recognized for improving generalization in deep networks, yet its impact on the geometry of learned representations remains poorly understood. In this work, we characterize how different data augmentation strategies reshape internal representations in neural networks. Using tools from shape analysis, we embed network hidden representations into a metric space where distance is invariant to scaling, translation, rotation and reflection. We show that increasing augmentation strength leads to well-behaved trajectories in this space, and that different augmentation types steer representations in distinct directions. Moreover, we investigate how neural representation shapes are distorted along data augmentation trajectories, and show that insights from neural geometry can predict which representations provide the most improvement when ensembling models. Our results reveal shared geometric patterns across architectures and seeds, and suggest that analyzing shape-space trajectories offers a principled tool for understanding and comparing data augmentation methods.

Show Abstract

Tracking the Fidelity of Internal Neural Representations with Error-In-Variables Regression

Isabel Garon, Stephen Keeley, A. Williams

Internal neural representations can systematically deviate from externally measured sensory and behavioral variables, yet neuroscientists lack a principled statistical framework to quantify these mismatches. Here we introduce a nonlinear error-in-variables regression framework that explicitly models neural activity as a function of latent internal variables that deviate from measured sensory and behavioral variables. This approach uses a flexible basis expansion and a sampling-based inference scheme to jointly infer neuron-specific tuning functions, latent trajectories, and a representational fidelity parameter κ that controls the strength of coupling between latent and measured variables. On synthetic datasets, the model accurately recovers latent dynamics, tuning curves, and identifies the true fidelity regime via cross-validated marginal likelihood. Applied to population recordings from mouse anterodorsal thalamic nucleus and rat medial entorhinal cortex across distinct sensory and behavioral conditions, the framework reveals condition-dependent changes in representational fidelity, tuning gain and profile, and uncovers latent population manifolds that are obscured in conventional tuning analyses. These results establish error-in-variables regression as a powerful and computationally tractable tool for tracking the fidelity of internal neural representations in systems neuroscience experiments.

Show Abstract

Learning to Recall with Transformers Beyond Orthogonal Embeddings

Mert Vural , A. Bietti, Mahdi Soltanolkotabi , D. Wu

Modern large language models (LLMs) excel at tasks that require storing and retrieving knowledge, such as factual recall and question answering. Transformers are central to this capability because they can encode information during training and retrieve it at inference. Existing theoretical analyses typically study transformers under idealized assumptions such as infinite data or orthogonal embeddings. In realistic settings, however, models are trained on finite datasets with non-orthogonal (random) embeddings. We address this gap by analyzing a single-layer transformer with random embeddings trained with (empirical) gradient descent on a simple token-retrieval task, where the model must identify an informative token within a length-L sequence and learn a one-to-one mapping from tokens to labels. Our analysis tracks the

Show Abstract

There Will Be a Scientific Theory of Deep Learning

Jamie Simon, Daniel Kunin, Alexander Atanasov, Enric Boix-Adser`, Blake Bordelon, J. Cohen, N. Ghosh, F. Guth, Arthur Jacot, Mason Kamb, Dhruva Karkada, Eric J. Michaud, Berkan Ottlik, Joseph Turnbull

In this paper, we make the case that a scientific theory of deep learning is emerging. By this we mean a theory which
characterizes important properties and statistics of the training process, hidden representations, final weights, and
performance of neural networks. We pull together major strands of ongoing research in deep learning theory and identify
five growing bodies of work that point toward such a theory:
1. 2. 3. 4. solvable idealized settings that provide intuition for learning dynamics in realistic systems;
tractable limits that reveal insights into fundamental learning phenomena;
simple mathematical laws that capture important macroscopic observables;
theories of hyperparameters that disentangle them from the rest of the training process, leaving simpler systems
behind; and
5. universal behaviors shared across systems and settings which clarify which phenomena call for explanation.
Taken together, these bodies of work share certain broad traits: they are concerned with the dynamics of the training
process; they primarily seek to describe coarse aggregate statistics; and they emphasize falsifiable quantitative predictions.
We argue that the emerging theory is best thought of as a mechanics of the learning process, and suggest the name learning
mechanics. We assert that learning mechanics should be a mathematical theory, grounded in first-principles calculations
that closely predict empirics, reliant on well-tested approximations and assumptions, aiming for broad impact across the
machine learning stack once it reaches maturity.
We discuss the relationship between this mechanics perspective and other approaches for building a theory of deep
learning, including the statistical and information-theoretic perspectives. In particular, we anticipate a symbiotic and
mutually supportive relationship between learning mechanics and the developing discipline of mechanistic interpretability.
Where mechanistic interpretability aims to be the biology of deep learning, learning mechanics should aspire to be its
physics, mirroring the complementary relationship between biology and physics in the natural sciences.
We also review and address common arguments that fundamental theory will not be possible or is not important. We
conclude with a portrait of important open directions in learning mechanics and advice for beginners. We host further
introductory materials, perspectives, and open questions at learningmechanics.pub.

Show Abstract

Modulating Cross-Modal Convergence with Single-Stimulus, Intra-Modal Dispersion

Eghbal A. Hosseini, Brian Cheung, Evelina Fedorenko, A. Williams

Neural networks exhibit a remarkable degree of representational convergence across diverse architectures, training objectives, and even data modalities. This convergence is predictive of alignment with brain representation. A recent hypothesis suggests this arises from learning the underlying structure in the environment in similar ways. However, it is unclear how individual stimuli elicit convergent representations across networks. An image can be perceived in multiple ways and expressed differently using words. Here, we introduce a methodology based on the Generalized Procrustes Algorithm to measure intra-modal representational convergence at the single-stimulus level. We applied this to vision models with distinct training objectives, selecting stimuli based on their degree of alignment (intra-modal dispersion). Crucially, we found that this intra-modal dispersion strongly modulates alignment between vision and language models (cross-modal convergence). Specifically, stimuli with low intra-modal dispersion (high agreement among vision models) elicited significantly higher cross-modal alignment than those with high dispersion, by up to a factor of two (e.g., in pairings of DINOv2 with language models). This effect was robust to stimulus selection criteria and generalized across different pairings of vision and language models. Measuring convergence at the single-stimulus level provides a path toward understanding the sources of convergence and divergence across modalities, and between neural networks and human neural representations.

Show Abstract

Reproducibility and model-selection stability in connectome-constrained circuit modeling

Christos Karaneen, E. Schomburg, D. Chklovskii

Connectome-constrained neural network models aim to link anatomical connectivity with functional computation by training networks whose architectures reflect biological circuits. Because such models are increasingly used to infer neural mechanisms, it is important to assess their robustness to variations in training conditions and model selection criteria. Here we retrain ensembles of connectome-constrained models under nominally identical conditions and compare their correspondence to experimentally measured response properties in the Drosophila motion pathway. While task performance remains similar across models, the identification of biologically plausible circuit solutions is unstable across retraining runs. In particular, model clusters selected by lowest validation task error do not reliably correspond to experimentally observed neural tuning, and small variations in performance metrics can reorder cluster rankings. These results indicate that, in this framework, similar task performance does not reliably identify biologically plausible circuit solutions. Task error alone is therefore insufficient for mechanistic identification, and additional model-selection criteria are needed.

Show Abstract

On the randomized SVD in infinite dimensions

Daniel Kressner, D. Persson, André Uschmajew

Randomized methods, such as the randomized SVD (singular value decomposition) and Nyström approximation, are an effective way to compute low-rank approximations of large matrices. Motivated by applications to operator learning, Boullé and Townsend (FoCM, 2023) recently proposed an infinite-dimensional extension of the randomized SVD for a Hilbert–Schmidt operator A that invokes randomness through a Gaussian process with a covariance operator K. While the non-isotropy introduced by K allows one to incorporate prior information on A, an unfortunate choice may lead to unfavorable performance and large constants in the error bounds. In this work, we introduce a novel infinite-dimensional extension of the randomized SVD that does not require such a choice and enjoys error bounds that match those for the finite-dimensional case. Our extension implicitly uses isotropic random vectors, reflecting a choice commonly made in the finite-dimensional case. In fact, the theoretical results of this work show how the usual randomized SVD applied to a discretization of A approaches our infinite-dimensional extension as the discretization gets refined, both in terms of error bounds and the Wasserstein distance. We also present and analyze a novel extension of the Nyström approximation for self-adjoint positive semi-definite trace class operators.

Show Abstract