481 Publications

Learning to Recall with Transformers Beyond Orthogonal Embeddings

Mert Vural , A. Bietti, Mahdi Soltanolkotabi , D. Wu

Modern large language models (LLMs) excel at tasks that require storing and retrieving knowledge, such as factual recall and question answering. Transformers are central to this capability because they can encode information during training and retrieve it at inference. Existing theoretical analyses typically study transformers under idealized assumptions such as infinite data or orthogonal embeddings. In realistic settings, however, models are trained on finite datasets with non-orthogonal (random) embeddings. We address this gap by analyzing a single-layer transformer with random embeddings trained with (empirical) gradient descent on a simple token-retrieval task, where the model must identify an informative token within a length-L sequence and learn a one-to-one mapping from tokens to labels. Our analysis tracks the

Show Abstract

On the randomized SVD in infinite dimensions

Daniel Kressner, D. Persson, André Uschmajew

Randomized methods, such as the randomized SVD (singular value decomposition) and Nyström approximation, are an effective way to compute low-rank approximations of large matrices. Motivated by applications to operator learning, Boullé and Townsend (FoCM, 2023) recently proposed an infinite-dimensional extension of the randomized SVD for a Hilbert–Schmidt operator A that invokes randomness through a Gaussian process with a covariance operator K. While the non-isotropy introduced by K allows one to incorporate prior information on A, an unfortunate choice may lead to unfavorable performance and large constants in the error bounds. In this work, we introduce a novel infinite-dimensional extension of the randomized SVD that does not require such a choice and enjoys error bounds that match those for the finite-dimensional case. Our extension implicitly uses isotropic random vectors, reflecting a choice commonly made in the finite-dimensional case. In fact, the theoretical results of this work show how the usual randomized SVD applied to a discretization of A approaches our infinite-dimensional extension as the discretization gets refined, both in terms of error bounds and the Wasserstein distance. We also present and analyze a novel extension of the Nyström approximation for self-adjoint positive semi-definite trace class operators.

Show Abstract

cryoJAX: A Cryo-electron Microscopy Image Simulation Library In JAX

Michael J. O'Brien, S. Hanson, D. Needleman, et al.

While cryo-electron microscopy (cryo-EM) has come to prominence in the last decade due to its ability to resolve biomolecular complexes at atomic resolution, advancements in experimental and computational methods have made cryo-EM promising for investigating intracellular organization and heterogeneous molecular states. A primary challenge for these alternative applications is the development of techniques for cryo-EM data analysis, which are very computationally demanding. To this end, it is advantageous to leverage advanced scientific computing frameworks for statistical analysis. One such framework is JAX, an emerging array-oriented Python numerical computing package for automatic differentiation and vectorization with a growing ecosystem for statistical inference and machine learning. We have developed cryoJAX, a cryo-EM image-simulation library for building computational data-analysis applications in JAX. CryoJAX is a flexible modeling language for cryo-EM image formation and therefore can support a wide range of data analysis downstream. By integrating with the JAX ecosystem, cryoJAX enables the development and deployment of algorithms for the growing breadth of scientific applications for cryo-EM.

Show Abstract

A fast algorithm for the wave equation using time-windowed Fourier projection

We introduce a new arbitrarily high-order method for the rapid evaluation of hyperbolic potentials (space-time integrals involving the Green’s function for the scalar wave equation). With M points in the spatial discretization and Nt time steps of size Δt, a naive implementation would require O(M2Nt2) work in dimensions where the weak Huygens’ principle applies. We avoid this all-to-all interaction using a smoothly windowed decomposition into a local part, treated directly, plus a history part, approximated by a NF-term Fourier series. In one dimension, our method requires O((M+NFlogNF)Nt) work, with NF=O(1/Δt), by exploiting the non-uniform fast Fourier transform. We demonstrate the method’s performance for time-domain scattering problems involving a large number M of springs (point scatterers) attached to a vibrating string at arbitrary locations, with either periodic or free-space boundary conditions. We typically achieve 10-digit accuracy, and include tests for M up to a million.

Show Abstract

A Lightweight, Geometrically Flexible Fast Algorithm for the Evaluation of Layer and Volume Potentials

F. Fryklund, L. Greengard, S. Jiang, Samuel Potter

Over the last two decades, several fast, robust, and high-order accurate methods have been developed for solving the Poisson equation in complicated geometry using potential theory. In this approach, rather than discretizing the partial differential equation itself, one first evaluates a volume integral to account for the source distribution within the domain, followed by solving a boundary integral equation to impose the specified boundary conditions. Here, we present a new fast algorithm which is easy to implement and compatible with virtually any discretization technique, including unstructured domain triangulations, such as those used in standard finite element or finite volume methods. Our approach combines earlier work on potential theory for the heat equation, asymptotic analysis, the nonuniform fast Fourier transform (NUFFT), and the dual-space multilevel kernel-splitting (DMK) framework. It is insensitive to flaws in the triangulation, permitting not just nonconforming elements, but arbitrary aspect ratio triangles, gaps and various other degeneracies. On a single CPU core, the scheme computes the solution at a rate comparable to that of the fast Fourier transform (FFT) in work per gridpoint.

Show Abstract

Low-temperature transport in high-conductivity correlated metals: a density-functional plus dynamical mean-field study of cubic perovskites

H. LaBollita , Jeremy Lee-Hand, Fabian B. Kugler, Lorenzo Van Muñoz, S. Beck, A. Hampel, J. Kaye, A. Georges, Cyrus E. Dreyer

While methods based on density-functional perturbation theory have dramatically improved our understanding of electron-phonon contributions to transport in materials, methods for accurately capturing electron-electron scattering relevant to low temperatures have seen significantly less development. The case of high-conductivity, moderately correlated materials characterized by low scattering rates is particularly challenging, since exquisite numerical precision of the low-energy electronic structure is required. Recent methodological advancements to density-functional theory combined with dynamical mean-field theory (DFT+DMFT), including adaptive Brillouin-zone integration and numerically precise self-energies, enable a rigorous investigation of electron-electron scattering in such materials. In particular, these tools may be leveraged to perform a robust scattering-rate analysis on both real- and imaginary-frequency axes. Applying this methodology to a subset of ABO$_3$ perovskite oxides -- SrVO$_3$, SrMoO$_3$, PbMoO$_3$, and SrRuO$_3$ -- we demonstrate its ability to qualitatively and quantitatively describe electron-electron contributions to the temperature-dependent direct-current resistivity. This combination of numerical techniques offers fundamental insight into the role of electronic correlations in transport phenomena and provides a predictive tool for identifying materials with potential for technological applications.

Show Abstract

Macroscopic approximation of tight-binding models near spectral degeneracies and validity for wave packet propagation

Guillaume Bal, Paul Cazeaux, Daniel Massatt , S. Quinn

This paper concerns the derivation and validity of macroscopic descriptions of wave packets supported in the vicinity of degenerate points (K,E) in the dispersion relation of tight-binding models accounting for macroscopic variations. We show that such wave packets are well approximated over long times by macroscopic models with varying orders of accuracy. Our main applications are in the analysis of single- and multilayer graphene tight-binding Hamiltonians modeling macroscopic variations such as those generated by shear or twist. Numerical simulations illustrate the theoretical findings.

Show Abstract

Understanding the Mechanisms of Fast Hyperparameter Transfer

The growing scale of deep learning models has rendered exhaustive hyperparameter (HP) optimization prohibitively expensive. A promising solution is the use of scale-aware HPs, which can enable direct transfer of optimal settings from small-scale grid searches to large models with minimal performance loss. Such approaches are useful when the optimal settings converge "fast" enough with scale. While approaches like the Maximal Update Parameterization (μP) have empirically displayed fast transfer when scaling model width, a deeper conceptual understanding of the mechanisms that enable this is still missing. Our work establishes a systematic conceptual framework for analyzing fast HP transfer across different synthetic and practical scenarios. In synthetic settings, we present various quantitative examples where transfer either offers a provable computational advantage or fails even under (μP). We then propose a key property that enables the fast transfer often observed in practice: through a novel decomposition of the optimization trajectory, we identify one component that rapidly converges with model width and determines the optimal HPs, and the other that continues to improve the loss with increased width but has negligible impact on HP choice. We conjecture that this decomposition elucidates the key mechanisms behind fast transfer and empirically validate it in practical settings such as LLM training.

Show Abstract

Multiscale Clustering and Source Separation of InSight Mission Seismic Data

Ali Siahkoohi , R. Morel, R. Balestriero, Erwan Allys, Grégory Sainton, Taichi Kawamura

Unsupervised source separation involves unraveling an unknown set of source signals recorded through a mixing operator, with limited prior knowledge about the sources, and only access to a dataset of signal mixtures. This problem is inherently ill-posed and is further challenged by the variety of timescales exhibited by sources in time series data from planetary space missions. As such, a systematic multiscale unsupervised approach is needed to identify and separate sources at different timescales. Existing methods typically rely on a preselected window size that determines their operating timescale, limiting their capacity to handle multiscale sources. To address this issue, we propose an unsupervised multiscale clustering and source separation framework by leveraging wavelet scattering spectra that provide a low-dimensional representation of stochastic processes, capable of distinguishing between different non-Gaussian stochastic processes. Nested within this representation space, we develop a factorial variational autoencoder (VAE) that is trained to probabilistically cluster sources at different timescales. To perform source separation, we use samples from clusters at multiple timescales obtained via the factorial VAE (fVAE) as prior information, and formulate an optimization problem in the wavelet scattering spectra representation space. When applied to the entire seismic dataset recorded during the NASA Interior Exploration using Seismic Investigations, Geodesy and Heat Transport (InSight) mission on Mars, containing sources varying greatly in timescale, our approach disentangles such different sources, e.g., minute-long transient one-sided pulses (known as “glitches”) and structured ambient noises resulting from atmospheric activities that typically last for tens of minutes, and provide an opportunity to conduct further investigations into the isolated sources.

Show Abstract

An O(logN) Monte Carlo method for periodic Coulomb systems

Efficient Monte Carlo (MC) sampling of many-body systems with long-range electrostatics is often limited by the cost of per-move energy-difference evaluation under periodic boundary conditions. We present DMK-MC, an accelerated MC method that adapts the dual-space multilevel kernel-splitting (DMK) framework to single-particle Metropolis updates. DMK-MC computes the energy change and, upon acceptance, updates the stored incoming plane-wave fields with O(1) work per tree level, yielding an overall O(logN) expected work per trial move for fixed accuracy. The method decomposes the Coulomb kernel into three components: a global, periodized smooth part; a multilevel sequence of smooth difference kernels whose interactions are restricted to same-level colleague boxes; and a singular residual kernel whose short-range interactions are evaluated directly. Benchmarks on uniform, highly nonuniform, and implicit-solvent electrolyte and colloidal configurations show that DMK-MC consistently outperforms a recent FMM-based O(logN) Monte Carlo method, delivering several-fold speedups at comparable tolerances.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates