443 Publications

The No-Underrun Sampler: A Locally-Adaptive, Gradient-Free MCMC Method

N. Bou-Rabee, B. Carpenter, S. Liu, Stefan Oberdörster

In this work, we introduce the No-Underrun Sampler (NURS), a locally-adaptive, gradient-free Markov chain Monte Carlo method that blends ideas from Hit-and-Run and the No-U-Turn Sampler. NURS dynamically adapts to the local scale of the target distribution without requiring gradient evaluations, making it especially suitable for applications where gradients are unavailable or costly. We establish key theoretical properties, including reversibility, formal connections to Hit-and-Run and Random Walk Metropolis, Wasserstein contraction comparable to Hit-and-Run in Gaussian targets, and bounds on the total variation distance between the transition kernels of Hit-and-Run and NURS. Empirical experiments, supported by theoretical insights, illustrate the ability of NURS to sample from Neal's funnel, a challenging multi-scale distribution from Bayesian hierarchical inference.

Show Abstract

A fully adaptive, high-order, fast Poisson solver for complex two-dimensional geometries

We present a new framework for the fast solution of inhomogeneous elliptic boundary value problems in domains with smooth boundaries. High-order solvers based on adaptive box codes or the fast Fourier transform can efficiently treat the volumetric inhomogeneity, but require care to be taken near the boundary to ensure that the volume data is globally smooth. We avoid function extension or cut-cell quadratures near the boundary by dividing the domain into two regions: a bulk region away from the boundary that is efficiently treated with a truncated free-space box code, and a variable-width boundary-conforming strip region that is treated with a spectral collocation method and accompanying fast direct solver. Particular solutions in each region are then combined with Laplace layer potentials to yield the global solution. The resulting solver has an optimal computational complexity of O(N) for an adaptive discretization with N degrees of freedom. With an efficient two-dimensional (2D) implementation we demonstrate adaptive resolution of volumetric data, boundary data, and geometric features across a wide range of length scales, to typically 10-digit accuracy. The cost of all boundary corrections remains small relative to that of the bulk box code. The extension to 3D is expected to be straightforward in many cases because the strip

Show Abstract

Understanding Optimization in Deep Learning with Central Flows

J. Cohen, Alex Damian, Ameet Talwalkar, J Zico Kolter, Jason D. Lee

Optimization in deep learning remains poorly understood. A key difficulty is that optimizers exhibit complex oscillatory dynamics, referred to as "edge of stability," which cannot be captured by traditional optimization theory. In this paper, we show that the path taken by an oscillatory optimizer can often be captured by a central flow: a differential equation which directly models the time-averaged (i.e. smoothed) optimization trajectory. We empirically show that these central flows can predict long-term optimization trajectories for generic neural networks with a high degree of numerical accuracy. By interpreting these flows, we are able to understand how gradient descent makes progress even as the loss sometimes goes up; how adaptive optimizers ``adapt'' to the local loss landscape; and how adaptive optimizers implicitly seek out regions of weight space where they can take larger steps. These insights (and others) are not apparent from the optimizers' update rules, but are revealed by the central flows. Therefore, we believe that central flows constitute a promising tool for reasoning about optimization in deep learning.

Show Abstract

Discrete Lehmann representation of three-point functions

Dominik Kiese, Hugo U. R. Strand, Kun Chen, Nils Wentzell, Olivier Parcollet, J. Kaye

We present a generalization of the discrete Lehmann representation (DLR) to three-point correlation and vertex functions in imaginary time and Matsubara frequency. The representation takes the form of a linear combination of judiciously chosen exponentials in imaginary time, and products of simple poles in Matsubara frequency, which are universal for a given temperature and energy cutoff. We present a systematic algorithm to generate compact sampling grids, from which the coefficients of such an expansion can be obtained by solving a linear system. We show that the explicit form of the representation can be used to evaluate diagrammatic expressions involving infinite Matsubara sums, such as polarization functions or self-energies, with controllable, high-order accuracy. This collection of techniques establishes a framework through which methods involving three-point objects can be implemented robustly, with a substantially reduced computational cost and memory footprint.

Show Abstract

Classical variational phase-field models cannot predict fracture nucleation

Oscar Lopez-Pamies, John E. Dolbow, G. Francfort, Christopher J. Larsen

Notwithstanding the evidence against them, classical variational phase-field models continue to be used and pursued in an attempt to describe fracture nucleation in elastic brittle materials. In this context, the main objective of this paper is to provide a comprehensive review of the existing evidence against such a class of models as descriptors of fracture nucleation. To that end, a review is first given of the plethora of experimental observations of fracture nucleation in nominally elastic brittle materials under quasi-static loading conditions, as well as of classical variational phase-field models, without and with energy splits. These models are then confronted with the experimental observations. The conclusion is that they cannot possibly describe fracture nucleation in general. This because classical variational phase-field models cannot account for material strength as an independent macroscopic material property. The last part of the paper includes a brief summary of a class of phase-field models that can describe fracture nucleation. It also provides a discussion of how pervasively material strength has been overlooked in the analysis of fracture at large, as well as an outlook into the modeling of fracture nucleation beyond the basic setting of elastic brittle materials.

Show Abstract

An Accurate and Efficient Scheme for Function Extension on Smooth Domains

C. Epstein, F. Fryklund, S. Jiang

A new scheme is proposed to construct an (n)-times differentiable function extension of an (n)-times differentiable function defined on a smooth domain, (D,) in (d)-dimensions. The extension scheme relies on an explicit formula consisting of a linear combination of (n+1) function values in (D,) which extends the function along directions normal to the boundary. Smoothness tangent to the boundary is automatic. The performance of the scheme is illustrated by using function extension as part of a numerical solver for the Poisson equation on domains with complex geometry in both two and three dimensions. Although the cost of extending the function increases mildly with the extension order, it remains a small fraction of the overall algorithm. Moreover, the modest additional work required for high order function extensions leads to considerably more accurate solutions of the partial differential equation.

Show Abstract

How does Mamba Perform Associative Recall? A Mechanistic Study

Grégoire LE CORRE, T. Huang, A. Bietti

Mamba has recently emerged as a promising alternative to Transformers, demonstrating competitive performance in many language modeling tasks with linear-time computational complexity. Theoretical characterization of Mamba has largely focused on its approximation power for solving certain tasks through specific constructions. However, it remains unclear whether Mamba trained with gradient descent can learn such constructions. As a first step to address this gap, we perform a mechanistic study of simplified Mamba models on associative recall tasks. By analyzing the learned model weights and the hidden state evolution, we uncover the mechanisms used by simplified Mamba models to perform associative recall. We complement our study with theoretical analysis on the optimization dynamics of simplified Mamba models that give rise to such mechanisms.

Show Abstract

Randomized Nyström Approximation of Non-negative Self-Adjoint Operators

D. Persson, Nicolas Boullé, Daniel Kressner

The randomized singular value decomposition (SVD) has become a popular approach to computing cheap, yet accurate, low-rank approximations to matrices due to its efficiency and strong theoretical guarantees. Recent work by Boullé and Townsend [Found. Comput. Math., 23 (2023), pp. 709–739] presents an infinite-dimensional analogue of the randomized SVD to approximate Hilbert–Schmidt operators. However, many applications involve computing low-rank approximations to symmetric positive semi-definite matrices. In this setting, it is well established that the randomized Nyström approximation is usually preferred over the randomized SVD. This paper explores an infinite-dimensional analogue of the Nyström approximation to compute low-rank approximations to non-negative self-adjoint trace-class operators. We present an analysis of the method and, along the way, improve the existing infinite-dimensional bounds for the randomized SVD. Our analysis yields bounds on the expected value and tail bounds for the Nyström approximation error in the operator, trace, and Hilbert–Schmidt norms. Numerical experiments on integral operators arising from Gaussian process sampling and Bayesian inverse problems are used to validate the proposed infinite-dimensional Nyström algorithm.

Show Abstract

Equispaced Fourier representations for efficient Gaussian process regression from a billion data points

Philip Greengard, M. Rachh, A. Barnett

We introduce a Fourier-based fast algorithm for Gaussian process regression in low dimensions. It approximates a translationally invariant covariance kernel by complex exponentials on an equispaced Cartesian frequency grid of \(M\) nodes. This results in a weight-space \(M\times M\) system matrix with Toeplitz structure, which can thus be applied to a vector in \({\mathcal O}(M \log{M})\) operations via the fast Fourier transform (FFT), independent of the number of data points \(N\). The linear system can be set up in \({\mathcal O}(N+M \log{M})\) operations using nonuniform FFTs. This enables efficient massive-scale regression via an iterative solver, even for kernels with fat-tailed spectral densities (large \(M\)). We provide bounds on both kernel approximation and posterior mean errors. Numerical experiments for squared-exponential and Matérn kernels in one, two, and three dimensions often show 1–2 orders of magnitude acceleration over state-of-the-art rank-structured solvers at comparable accuracy. Our method allows two-dimensional Matérn-\(\frac{3}{2}\) regression from \(N=10^9\) data points to be performed in two minutes on a standard desktop, with posterior mean accuracy \(10^{-3}\). This opens up spatial statistics applications 100 times larger than previously possible.

Show Abstract

A mixing time bound for Gibbs sampling from log-smooth log-concave distributions

The Gibbs sampler, also known as the coordinate hit-and-run algorithm, is a Markov chain that is widely used to draw samples from probability distributions in arbitrary dimensions. At each iteration of the algorithm, a randomly selected coordinate is resampled from the distribution that results from conditioning on all the other coordinates. We study the behavior of the Gibbs sampler on the class of log-smooth and strongly log-concave target distributions supported on ℝ

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates