2697 Publications

Understanding Factual Recall in Transformers via Associative Memories

Eshaan Nichani, Jason D. Lee, A. Bietti

Large language models have demonstrated an impressive ability to perform factual recall. Prior work has found that transformers trained on factual recall tasks can store information at a rate proportional to their parameter count. In our work, we show that shallow transformers can use a combination of associative memories to obtain such near optimal storage capacity. We begin by proving that the storage capacities of both linear and MLP associative memories scale linearly with parameter count. We next introduce a synthetic factual recall task, and prove that a transformer with a single layer of self-attention followed by an MLP can obtain 100% accuracy on the task whenever either the total number of self-attention parameters or MLP parameters scales (up to log factors) linearly with the number of facts. In particular, the transformer can trade off between using the value matrices or the MLP as an associative memory to store the dataset of facts. We complement these expressivity results with an analysis of the gradient flow trajectory of a simplified linear attention model trained on our factual recall task, where we show that the model exhibits sequential learning behavior.

Show Abstract

CyclicCAE: A Conformational Autoencoder for Efficient Heterochiral Macrocyclic Backbone Sampling

Andrew C. Powers, D. Renfrew, Parisa Hosseinzadeh, V. Mulligan

Macrocycles are a promising therapeutic class. The incorporation of heterochiral and non-natural chemical building-blocks presents challenges for rational design, however. With no existing machine learning methods tailored for heterochiral macrocycle design, we developed a novel convolutional autoencoder model to rapidly generate energetically favorable macrocycle backbones for heterochiral design and structure prediction. Our approach surpasses the current state-of-the-art method, Generalized Kinematic loop closure (GenKIC) in the Rosetta software suite. Given the absence of large, available macrocycle datasets, we created a custom dataset in-house and in silico. Our model, CyclicCAE, produces energetically stable backbones and designable structures more rapidly than GenKIC. It enables users to perform energy minimization, generate structurally similar or diverse inputs via MCMC, and conduct inpainting with fixed anchors or motifs. We propose that this novel method will accelerate the development of stable macrocycles, speeding up macrocycle drug design pipelines.

Show Abstract
February 27, 2025

Spectral Analysis of Representational Similarity with Limited Neurons

Hyunmo Kang, A. Canatar, S. Chung

Measuring representational similarity between neural recordings and computational models is challenging due to constraints on the number of
neurons that can be recorded simultaneously. In this work, we investigate how such limitations affect similarity measures, focusing on Canonical Correlation Analysis (CCA) and Centered Kernel Alignment (CKA). Leveraging tools from Random Matrix Theory, we develop a predictive spectral framework for these measures and demonstrate that finite neuron sampling systematically underestimates similarity due to eigenvector de-
localization. To overcome this, we introduce a denoising method to infer population-level similarity, enabling accurate analysis even with small
neuron samples. Our theory is validated on synthetic and real datasets, offering practical strategies for interpreting neural data under finite sampling constraints.

Show Abstract
February 27, 2025

Type-I Superconductors in the Limit as the London Penetration Depth Goes to 0

C. Epstein, M. Rachh, Yuguan Wang

This paper provides an explicit formula for the approximate solution of the static London equations. These equations describe the currents and magnetic fields in a Type-I superconductor. We represent the magnetic field as a 2-form and the current as a 1-form, and assume that the superconducting material is contained in a bounded, connected set, Ω, with smooth boundary. The London penetration depth gives an estimate for the thickness of the layer near ∂Ω where the current is largely carried. In an earlier paper, we introduced a system of Fredholm integral equations of second kind, on ∂Ω, for solving the physically relevant scattering problems in this context. In real Type-I superconductors the penetration depth is very small, typically about 100nm, which often renders the integral equation approach computationally intractable. In this paper we provide an explicit formula for approximate solutions, with essentially optimal error estimates, as the penetration depth tends to zero. Our work makes extensive use of the Hodge decomposition of differential forms on manifolds with boundary, and thus evokes Kohn's work on the tangential Cauchy-Riemann equations.

Show Abstract

Modeling Neural Activity with Conditionally Linear Dynamical Systems

Victor Geadah, A. Nejatbakhsh, D. Lipshutz, J. Pillow, A. Williams

Neural population activity exhibits complex, nonlinear dynamics, varying in time, over trials, and across experimental conditions. Here, we develop Conditionally Linear Dynamical System (CLDS) models as a general-purpose method to characterize these dynamics. These models use Gaussian Process (GP) priors to capture the nonlinear dependence of circuit dynamics on task and behavioral variables. Conditioned on these covariates, the data is modeled with linear dynamics. This allows for transparent interpretation and tractable Bayesian inference. We find that CLDS models can perform well even in severely data-limited regimes (e.g. one trial per condition) due to their Bayesian formulation and ability to share statistical power across nearby task conditions. In example applications, we apply CLDS to model thalamic neurons that nonlinearly encode heading direction and to model motor cortical neurons during a cued reaching task

Show Abstract

Brain-Model Evaluations Need the NeuroAI Turing Test

J. Feather, Meenakshi Khosla, N. Apurva Ratan Murty, Aran Nayebi

What makes an artificial system a good model of intelligence? The classical test proposed by Alan Turing focuses on behavior, requiring that an artificial agent's behavior be indistinguishable from that of a human. While behavioral similarity provides a strong starting point, two systems with very different internal representations can produce the same outputs. Thus, in modeling biological intelligence, the field of NeuroAI often aims to go beyond behavioral similarity and achieve representational convergence between a model's activations and the measured activity of a biological system. This position paper argues that the standard definition of the Turing Test is incomplete for NeuroAI, and proposes a stronger framework called the ``NeuroAI Turing Test'', a benchmark that extends beyond behavior alone and \emph{additionally} requires models to produce internal neural representations that are empirically indistinguishable from those of a brain up to measured individual variability, i.e. the differences between a computational model and the brain is no more than the difference between one brain and another brain. While the brain is not necessarily the ceiling of intelligence, it remains the only universally agreed-upon example, making it a natural reference point for evaluating computational models. By proposing this framework, we aim to shift the discourse from loosely defined notions of brain inspiration to a systematic and testable standard centered on both behavior and internal representations, providing a clear benchmark for neuroscientific modeling and AI development.

Show Abstract
February 22, 2025

Engineering anisotropic electrodynamics at the graphene/CrSBr interface

Graphene is a privileged 2D platform for hosting confined light-matter excitations known as surface plasmon polaritons (SPPs), as it possesses low intrinsic losses and a high degree of optical confinement. However, the isotropic nature of graphene limits its ability to guide and focus SPPs, making it less suitable than anisotropic elliptical and hyperbolic materials for polaritonic lensing and canalization. Here, we present graphene/CrSBr as an engineered 2D interface that hosts highly anisotropic SPP propagation across mid-infrared and terahertz energies. Using scanning tunneling microscopy, scattering-type scanning near-field optical microscopy, and first-principles calculations, we demonstrate mutual doping in excess of 1013cm--2 holes/electrons between the interfacial layers of graphene/CrSBr. SPPs in graphene activated by charge transfer interact with charge-induced electronic anisotropy in the interfacial doped CrSBr, leading to preferential SPP propagation along the quasi-1D chains that compose each CrSBr layer. This multifaceted proximity effect both creates SPPs and endows them with anisotropic propagation lengths that differ by an order-of-magnitude between the in-plane crystallographic axes of CrSBr.
Show Abstract

Nongenetic adaptation by collective migration

Lam Vo, Fotios Avgidis, H. Mattingly, et al.

Cell populations must adjust their phenotypic composition to adapt to changing environments. One adaptation strategy is to maintain distinct phenotypic subsets within the population and to modulate their relative abundances via gene regulation. Another strategy involves genetic mutations, which can be augmented by stress-response pathways. Here, we studied how a migrating bacterial population regulates its phenotypic distribution to traverse diverse environments. We generated isogenic Escherichia coli populations with varying distributions of swimming behaviors and observed their phenotype distributions during migration in liquid and porous environments. We found that the migrating populations became enriched with high-performing swimming phenotypes in each environment, allowing the populations to adapt without requiring mutations or gene regulation. This adaptation is dynamic and rapid, reversing in a few doubling times when migration ceases. By measuring the chemoreceptor abundance distributions during migration toward different attractants, we demonstrated that adaptation acts on multiple chemotaxis-related traits simultaneously. These measurements are consistent with a general mechanism in which adaptation results from a balance between cell growth generating diversity and collective migration eliminating underperforming phenotypes. Thus, collective migration enables cell populations with continuous, multidimensional phenotypes to flexibly and rapidly adapt their phenotypic composition to diverse environmental conditions.

Show Abstract

The ManifoldEM method for cryo-EM: a step-by-step breakdown accompanied by a modern Python implementation

A. A. Ojha, R. Blackwell, M. Astore, S. Hanson, et al.

Resolving continuous conformational heterogeneity in single-particle cryo-electron microscopy (cryo-EM) is a field in which new methods are now emerging regularly. Methods range from traditional statistical techniques to state-of-the-art neural network approaches. Such ongoing efforts continue to enhance the ability to explore and understand the continuous conformational variations in cryo-EM data. One of the first methods was the manifold embedding approach or ManifoldEM. However, comparing it with more recent methods has been challenging due to software availability and usability issues. In this work, we introduce a modern Python implementation that is user-friendly, orders of magnitude faster than its previous versions and designed with a developer-ready environment. This implementation allows a more thorough evaluation of the strengths and limitations of methods addressing continuous conformational heterogeneity in cryo-EM, paving the way for further community-driven improvements.

Show Abstract

Accurate close interactions of Stokes spheres using lubrication-adapted image systems

Anna Broms, A. Barnett, Anna-Karin Tornberg

Stokes flows with near-touching rigid particles induce near-singular lubrication forces under relative motion, making their accurate numerical treatment challenging. With the aim of controlling the accuracy with a computationally cheap method, we present a new technique that combines the method of fundamental solutions (MFS) with the method of images. For rigid spheres, we propose to represent the flow using Stokeslet proxy sources on interior spheres, augmented by lines of image sources adapted to each near-contact to resolve lubrication. Source strengths are found by a least-squares solve at contact-adapted boundary collocation nodes. We include extensive numerical tests, and validate against reference solutions from a well-resolved boundary integral formulation. With less than 60 additional image sources per particle per contact, we show controlled uniform accuracy to three relative digits in surface velocities, and up to five digits in particle forces and torques, for all separations down to a thousandth of the radius. In the special case of flows around fixed particles, the proxy sphere alone gives controlled accuracy. A one-body preconditioning strategy allows acceleration with the fast multipole method, hence close to linear scaling in the number of particles. This is demonstrated by solving problems of up to 2000 spheres on a workstation using only 700 proxy sources per particle.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates