CCM: Publications

A fully adaptive, high-order, fast Poisson solver for complex two-dimensional geometries

We present a new framework for the fast solution of inhomogeneous elliptic boundary value problems in domains with smooth boundaries. High-order solvers based on adaptive box codes or the fast Fourier transform can efficiently treat the volumetric inhomogeneity, but require care to be taken near the boundary to ensure that the volume data is globally smooth. We avoid function extension or cut-cell quadratures near the boundary by dividing the domain into two regions: a bulk region away from the boundary that is efficiently treated with a truncated free-space box code, and a variable-width boundary-conforming strip region that is treated with a spectral collocation method and accompanying fast direct solver. Particular solutions in each region are then combined with Laplace layer potentials to yield the global solution. The resulting solver has an optimal computational complexity of O(N) for an adaptive discretization with N degrees of freedom. With an efficient two-dimensional (2D) implementation we demonstrate adaptive resolution of volumetric data, boundary data, and geometric features across a wide range of length scales, to typically 10-digit accuracy. The cost of all boundary corrections remains small relative to that of the bulk box code. The extension to 3D is expected to be straightforward in many cases because the strip

Show Abstract

Understanding Optimization in Deep Learning with Central Flows

J. Cohen, Alex Damian, Ameet Talwalkar, J Zico Kolter, Jason D. Lee

Optimization in deep learning remains poorly understood. A key difficulty is that optimizers exhibit complex oscillatory dynamics, referred to as "edge of stability," which cannot be captured by traditional optimization theory. In this paper, we show that the path taken by an oscillatory optimizer can often be captured by a central flow: a differential equation which directly models the time-averaged (i.e. smoothed) optimization trajectory. We empirically show that these central flows can predict long-term optimization trajectories for generic neural networks with a high degree of numerical accuracy. By interpreting these flows, we are able to understand how gradient descent makes progress even as the loss sometimes goes up; how adaptive optimizers ``adapt'' to the local loss landscape; and how adaptive optimizers implicitly seek out regions of weight space where they can take larger steps. These insights (and others) are not apparent from the optimizers' update rules, but are revealed by the central flows. Therefore, we believe that central flows constitute a promising tool for reasoning about optimization in deep learning.

Show Abstract

Discrete Lehmann representation of three-point functions

Dominik Kiese, Hugo U. R. Strand, Kun Chen, Nils Wentzell, Olivier Parcollet, J. Kaye

We present a generalization of the discrete Lehmann representation (DLR) to three-point correlation and vertex functions in imaginary time and Matsubara frequency. The representation takes the form of a linear combination of judiciously chosen exponentials in imaginary time, and products of simple poles in Matsubara frequency, which are universal for a given temperature and energy cutoff. We present a systematic algorithm to generate compact sampling grids, from which the coefficients of such an expansion can be obtained by solving a linear system. We show that the explicit form of the representation can be used to evaluate diagrammatic expressions involving infinite Matsubara sums, such as polarization functions or self-energies, with controllable, high-order accuracy. This collection of techniques establishes a framework through which methods involving three-point objects can be implemented robustly, with a substantially reduced computational cost and memory footprint.

Show Abstract

Equispaced Fourier representations for efficient Gaussian process regression from a billion data points

Philip Greengard, M. Rachh, A. Barnett

We introduce a Fourier-based fast algorithm for Gaussian process regression in low dimensions. It approximates a translationally invariant covariance kernel by complex exponentials on an equispaced Cartesian frequency grid of \(M\) nodes. This results in a weight-space \(M\times M\) system matrix with Toeplitz structure, which can thus be applied to a vector in \({\mathcal O}(M \log{M})\) operations via the fast Fourier transform (FFT), independent of the number of data points \(N\). The linear system can be set up in \({\mathcal O}(N+M \log{M})\) operations using nonuniform FFTs. This enables efficient massive-scale regression via an iterative solver, even for kernels with fat-tailed spectral densities (large \(M\)). We provide bounds on both kernel approximation and posterior mean errors. Numerical experiments for squared-exponential and Matérn kernels in one, two, and three dimensions often show 1–2 orders of magnitude acceleration over state-of-the-art rank-structured solvers at comparable accuracy. Our method allows two-dimensional Matérn-\(\frac{3}{2}\) regression from \(N=10^9\) data points to be performed in two minutes on a standard desktop, with posterior mean accuracy \(10^{-3}\). This opens up spatial statistics applications 100 times larger than previously possible.

Show Abstract

Uniqueness, regularity and characteristic flow for a non strictly convex singular variational problem

Jean-Francois Babadjian, G. Francfort

This work addresses the question of uniqueness and regularity of the minimizers of a convex but not strictly convex integral functional with linear growth in a two-dimensional setting. The integrand -- whose precise form derives directly from the theory of perfect plasticity -- behaves quadratically close to the origin and grows linearly once a specific threshold is reached. Thus, in contrast with the only existing literature on uniqueness for functionals with linear growth, that is that which pertains to the generalized least gradient, the integrand is not a norm. We make use of hyperbolic conservation laws hidden in the structure of the problem to tackle uniqueness. Our argument strongly relies on the regularity of a vector field -- the Cauchy stress in the terminology of perfect plasticity -- which allows us to define characteristic lines, and then to employ the method of characteristics. Using the detailed structure of the characteristic landscape evidenced in our preliminary study BF, we show that this vector field is actually continuous, save for possibly two points. The different behaviors of the energy density at zero and at infinity imply an inequality constraint on the Cauchy stress. Under a barrier type convexity assumption on the set where the inequality constraint is saturated, we show that uniqueness holds for pure Dirichlet boundary data devoid of any regularity properties, a stronger result than that of uniqueness for a given trace on the whole boundary since our minimizers can fail to attain the boundary data. We also show a partial regularity result for the minimizer.

Show Abstract

A mixing time bound for Gibbs sampling from log-smooth log-concave distributions

N. Wadia

The Gibbs sampler, also known as the coordinate hit-and-run algorithm, is a Markov chain that is widely used to draw samples from probability distributions in arbitrary dimensions. At each iteration of the algorithm, a randomly selected coordinate is resampled from the distribution that results from conditioning on all the other coordinates. We study the behavior of the Gibbs sampler on the class of log-smooth and strongly log-concave target distributions supported on ℝ

Show Abstract

Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit

Jason D. Lee, Kazusato Oko, Taiji Suzuki, D. Wu

We study the problem of gradient descent learning of a single-index target function f∗(x) = σ∗(⟨x,θ⟩) under isotropic Gaussian data in Rd, where the unknown link function σ∗ : R → R has information exponent p (defined as the lowest degree in the Hermite expansion). Prior works showed that gradientbased training of neural networks can learn this target with n ≳ dΘ(p) samples, and such complexity is predicted to be necessary by the correlational statistical query lower bound. Surprisingly, we prove that a two-layer neural network optimized by an SGD-based algorithm (on the squared loss) learns f∗ with a complexity that is not governed by the information exponent. Specifically, for arbitrary polynomial single-index models, we establish a sample and runtime complexity of n ≃ T = Θ(d·polylogd), where Θ(·) hides a constant only depending on the degree of σ∗; this dimension dependence matches the information theoretic limit up to polylogarithmic factors. More generally, we show that n ≳ d(p∗−1)∨1 samples are sufficient to achieve low generalization error, where p∗ ≤ p is the generative exponent of the link function. Core to our analysis is the reuse of minibatch in the gradient computation, which gives rise to higher-order information beyond correlational queries.

Show Abstract

xVal: A Continuous Numerical Tokenization for Scientific Language Models

Siavash Golkar, Ph.D. , Mariel Pettee, Ph.D. , M. Eickenberg, A. Bietti, et al.

Due in part to their discontinuous and discrete default encodings for numbers, Large Language Models (LLMs) have not yet been commonly used to process numerically-dense scientific datasets. Rendering datasets as text, however, could help aggregate diverse and multi-modal scientific data into a single training corpus, thereby potentially facilitating the development of foundation models for science. In this work, we introduce xVal, a strategy for continuously tokenizing numbers within language models that results in a more appropriate inductive bias for scientific applications. By training specially-modified language models from scratch on a variety of scientific datasets formatted as text, we find that xVal generally outperforms other common numerical tokenization strategies on metrics including out-of-distribution generalization and computational efficiency.

Show Abstract

Multiple Physics Pretraining for Physical Surrogate Models

Michael McCabe, B. Régaldo-Saint Blancard, Liam Holden Parker, R. Ohana, Miles Cranmer, A. Bietti, Michael Eickenberg, et al.

We introduce multiple physics pretraining (MPP), an autoregressive task-agnostic pretraining approach for physical surrogate modeling of spatiotemporal systems with transformers. In MPP, rather than training one model on a specific physical system, we train a backbone model to predict the dynamics of multiple heterogeneous physical systems simultaneously in order to learn features that are broadly useful across systems and facilitate transfer. In order to learn effectively in this setting, we introduce a shared embedding and normalization strategy that projects the fields of multiple systems into a shared embedding space. We validate the efficacy of our approach on both pretraining and downstream tasks over a broad fluid mechanics-oriented benchmark. We show that a single MPP-pretrained transformer is able to match or outperform task-specific baselines on all pretraining sub-tasks without the need for finetuning. For downstream tasks, we demonstrate that finetuning MPP-trained models results in more accurate predictions across multiple time-steps on systems with previously unseen physical components or higher dimensional systems compared to training from scratch or finetuning pretrained video foundation models. We open-source our code and model weights trained at multiple scales for reproducibility.

Show Abstract

Provable Posterior Sampling with Denoising Oracles via Tilted Transport

Joan Bruna, J. Han

Score-based diffusion models have significantly advanced high-dimensional data generation across various domains, by learning a denoising oracle (or score) from datasets. From a Bayesian perspective, they offer a realistic modeling of data priors and facilitate solving inverse problems through posterior sampling. Although many heuristic methods have been developed recently for this purpose, they lack the quantitative guarantees needed in many scientific applications. This work addresses the topic from two perspectives. We first present a hardness result indicating that a generic method leveraging the prior denoising oracle for posterior sampling becomes infeasible as soon as the measurement operator is mildly ill-conditioned. We next develop the tilted transport technique, which leverages the quadratic structure of the log-likelihood in linear inverse problems in combination with the prior denoising oracle to exactly transform the original posterior sampling problem into a new one that is provably easier to sample from. We quantify the conditions under which the boosted posterior is strongly log-concave, highlighting how task difficulty depends on the condition number of the measurement matrix and the signal-to-noise ratio. The resulting general scheme is shown to match the best-known sampling methods for Ising models, and is further validated on high-dimensional Gaussian mixture models.

Show Abstract