CCM: Publications

Discriminative calibration: Check Bayesian computation from simulations and flexible classifier

Y. Yao, Justin Domke

To check the accuracy of Bayesian computations, it is common to use rank-based simulation-based calibration (SBC). However, SBC has drawbacks: The test statistic is somewhat ad-hoc, interactions are difficult to examine, multiple testing is a challenge, and the resulting p-value is not a divergence metric. We propose to replace the marginal rank test with a flexible classification approach that learns test statistics from data. This measure typically has a higher statistical power than the SBC rank test and returns an interpretable divergence measure of miscalibration, computed from classification accuracy. This approach can be used with different data generating processes to address likelihood-free inference or traditional inference methods like Markov chain Monte Carlo or variational inference. We illustrate an automated implementation using neural networks and statistically-inspired features, and validate the method with numerical and real data experiments.

Show Abstract

Trapped acoustic waves and raindrops: high-order accurate integral equation method for localized excitation of a periodic staircase

F. Agocs, A. Barnett

We present a high-order boundary integral equation (BIE) method for the frequency-domain acoustic scattering of a point source by a singly-periodic, infinite, corrugated boundary. We apply it to the accurate numerical study of acoustic radiation in the neighborhood of a sound-hard two-dimensional staircase modeled after the El Castillo pyramid. Such staircases support trapped waves which travel along the surface and decay exponentially away from it. We use the array scanning method (Floquet--Bloch transform) to recover the scattered field as an integral over the family of quasiperiodic solutions parameterized by their on-surface wavenumber. Each such BIE solution requires the quasiperiodic Green's function, which we evaluate using an efficient integral representation of lattice sum coefficients. We avoid the singularities and branch cuts present in the array scanning integral by complex contour deformation. For each frequency, this enables a solution accurate to around 10 digits in a couple of seconds. We propose a residue method to extract the limiting powers carried by trapped modes far from the source. Finally, by computing the trapped mode dispersion relation, we use a simple ray model to explain an observed acoustic "raindrop" effect (chirp-like time-domain response).

Show Abstract

MoMo: Momentum Models for Adaptive Learning Rates

Fabian Schaipp, R. Ohana, M. Eickenberg, Aaron Defazio, R. M. Gower

Training a modern machine learning architecture on a new task requires extensive learning-rate tuning, which comes at a high computational cost. Here we develop new adaptive learning rates that can be used with any momentum method, and require less tuning to perform well. We first develop MoMo, a Momentum Model based adaptive learning rate for SGD-M (Stochastic gradient descent with momentum). MoMo uses momentum estimates of the batch losses and gradients sampled at each iteration to build a model of the loss function. Our model also makes use of any known lower bound of the loss function by using truncation, e.g. most losses are lower-bounded by zero. We then approximately minimize this model at each iteration to compute the next step. We show how MoMo can be used in combination with any momentum-based method, and showcase this by developing MoMo-Adam - which is Adam with our new model-based adaptive learning rate. Additionally, for losses with unknown lower bounds, we develop on-the-fly estimates of a lower bound, that are incorporated in our model. Through extensive numerical experiments, we demonstrate that MoMo and MoMo-Adam improve over SGD-M and Adam in terms of accuracy and robustness to hyperparameter tuning for training image classifiers on MNIST, CIFAR10, CIFAR100, Imagenet, recommender systems on the Criteo dataset, and a transformer model on the translation task IWSLT14.

Show Abstract

Stabilizing the calculation of the self-energy in dynamical mean-field theory using constrained residual minimization

Harrison LaBollita, J. Kaye, Alexander Hampel

We propose a simple and efficient method to calculate the electronic self-energy in dynamical mean-field theory (DMFT), addressing a numerical instability often encountered when solving the Dyson equation. Our approach formulates the Dyson equation as a constrained optimization problem with a simple quadratic objective. The constraints on the self-energy are obtained via direct measurement of the leading order terms of its asymptotic expansion within a continuous time quantum Monte Carlo framework, and the use of the compact discrete Lehmann representation of the self-energy yields an optimization problem in a modest number of unknowns. We benchmark our method for the non-interacting Bethe lattice, as well as DMFT calculations for both model systems and \textit{ab-initio} applications.

Show Abstract

A class of dimension-free metrics for the convergence of empirical measures

J. Han, Ruimeng Hu, Jihao Long

This paper concerns the convergence of empirical measures in high dimensions. We propose a new class of probability metrics and show that under such metrics, the convergence is free of the curse of dimensionality (CoD). Such a feature is critical for high-dimensional analysis and stands in contrast to classical metrics (e.g., the Wasserstein metric). The proposed metrics fall into the category of integral probability metrics, for which we specify criteria of test function spaces to guarantee the property of being free of CoD. Examples of the selected test function spaces include the reproducing kernel Hilbert spaces, Barron space, and flow-induced function spaces. Three applications of the proposed metrics are presented: 1. The convergence of empirical measure in the case of random variables; 2. The convergence of n-particle system to the solution to McKean–Vlasov stochastic differential equation; 3. The construction of an ɛ-Nash equilibrium for a homogeneous n-player game by its mean-field limit. As a byproduct, we prove that, given a distribution close to the target distribution measured by our metric and a certain representation of the target distribution, we can generate a distribution close to the target one in terms of the Wasserstein metric and relative entropy. Overall, we show that the proposed class of metrics is a powerful tool to analyze the convergence of empirical measures in high dimensions without CoD.

Show Abstract

A Neural Network Warm-Start Approach for the Inverse Acoustic Obstacle Scattering Problem

Mo Zhou, J. Han, M. Rachh, Carlos Borges

In this paper, we consider the inverse acoustic obstacle problem for sound-soft star-shaped obstacles in two dimensions wherein the boundary of the obstacle is determined from measurements of the scattered field at a collection of receivers outside the object. One of the standard approaches for solving this problem is to reformulate it as an optimization problem: finding the boundary of the domain that minimizes the L2 distance between computed values of the scattered field and the given measurement data. The optimization problem is computationally challenging since the local set of convexity shrinks with increasing frequency and results in an increasing number of local minima in the vicinity of the true solution. In many practical experimental settings, low frequency measurements are unavailable due to limitations of the experimental setup or the sensors used for measurement. Thus, obtaining a good initial guess for the optimization problem plays a vital role in this environment. We present a neural network warm-start approach for solving the inverse scattering problem, where an initial guess for the optimization problem is obtained using a trained neural network. We demonstrate the effectiveness of our method with several numerical examples. For high frequency problems, this approach outperforms traditional iterative methods such as Gauss-Newton initialized without any prior (i.e., initialized using a unit circle), or initialized using the solution of a direct method such as the linear sampling method. The algorithm remains robust to noise in the scattered field measurements and also converges to the true solution for limited aperture data. However, the number of training samples required to train the neural network scales exponentially in frequency and the complexity of the obstacles considered. We conclude with a discussion of this phenomenon and potential directions for future research.

Show Abstract

Variational Inference with Gaussian Score Matching

C. Modi, C. Margossian, Y. Yao, R. M. Gower, D. Blei, L. Saul

Variational inference (VI) is a method to approximate the computationally intractable posterior distributions that arise in Bayesian statistics. Typically, VI fits a simple parametric distribution to be close to the target posterior, optimizing an appropriate objective such as the evidence lower bound (ELBO). In this work, we present a new approach to VI. Our method is based on the principle of score matching---namely, that if two distributions are equal then their score functions (i.e., gradients of the log density) are equal at every point on their support. With this principle, we develop score-matching VI, an iterative algorithm that seeks to match the scores between the variational approximation and the exact posterior. At each iteration, score-matching VI solves an inner optimization, one that minimally adjusts the current variational estimate to match the scores at a newly sampled value of the latent variables. We show that when the variational family is a Gaussian, this inner optimization enjoys a closed-form solution, which we call Gaussian score matching VI (GSM-VI). GSM-VI is a ``black box'' variational algorithm in that it only requires a differentiable joint distribution, and as such it can be applied to a wide class of models. We compare GSM-VI to black box variational inference (BBVI), which has similar requirements but instead optimizes the ELBO. We first study how GSM-VI behaves as a function of the problem dimensionality, the condition number of the target covariance matrix (when the target is Gaussian), and the degree of mismatch between the approximating and exact posterior distribution. We then study GSM-VI on a collection of real-world Bayesian inference problems from the posteriorDB database of datasets and models. We find that GSM-VI is faster than BBVI and equally or more accurate. Specifically, over a wide range of target posteriors, GSM-VI requires 10-100x fewer gradient evaluations than BBVI to obtain a comparable quality of approximation.

Show Abstract

A Dual-space Multilevel Kernel-splitting Framework for Discrete and Continuous Convolution

S. Jiang, L. Greengard

We introduce a new class of multilevel, adaptive, dual-space methods for computing fast convolutional transforms. These methods can be applied to a broad class of kernels, from the Green's functions for classical partial differential equations (PDEs) to power functions and radial basis functions such as those used in statistics and machine learning. The DMK (dual-space multilevel kernel-splitting) framework uses a hierarchy of grids, computing a smoothed interaction at the coarsest level, followed by a sequence of corrections at finer and finer scales until the problem is entirely local, at which point direct summation is applied. The main novelty of DMK is that the interaction at each scale is diagonalized by a short Fourier transform, permitting the use of separation of variables, but without requiring the FFT for its asymptotic performance. The DMK framework substantially simplifies the algorithmic structure of the fast multipole method (FMM) and unifies the FMM, Ewald summation, and multilevel summation, achieving speeds comparable to the FFT in work per gridpoint, even in a fully adaptive context. For continuous source distributions, the evaluation of local interactions is further accelerated by approximating the kernel at the finest level as a sum of Gaussians with a highly localized remainder. The Gaussian convolutions are calculated using tensor product transforms, and the remainder term is calculated using asymptotic methods. We illustrate the performance of DMK for both continuous and discrete sources with extensive numerical examples in two and three dimensions.

Show Abstract

Compressing the memory variables in constant-Q viscoelastic wave propagation via an improved sum-of-exponentials approximation

Xu Guo, S. Jiang, Yunfeng Xiong, Jiwei Zhang

Earth introduces strong attenuation and dispersion to propagating waves. The time-fractional wave equation with very small fractional exponent, based on Kjartansson's constant-Q theory, is widely recognized in the field of geophysics as a reliable model for frequency-independent Q anelastic behavior. Nonetheless, the numerical resolution of this equation poses considerable challenges due to the requirement of storing a complete time history of wavefields. To address this computational challenge, we present a novel approach: a nearly optimal sum-of-exponentials (SOE) approximation to the Caputo fractional derivative with very small fractional exponent, utilizing the machinery of generalized Gaussian quadrature. This method minimizes the number of memory variables needed to approximate the power attenuation law within a specified error tolerance. We establish a mathematical equivalence between this SOE approximation and the continuous fractional stress-strain relationship, relating it to the generalized Maxwell body model. Furthermore, we prove an improved SOE approximation error bound to thoroughly assess the ability of rheological models to replicate the power attenuation law. Numerical simulations on constant-Q viscoacoustic equation in 3D homogeneous media and variable-order P- and S- viscoelastic wave equations in 3D inhomogeneous media are performed. These simulations demonstrate that our proposed technique accurately captures changes in amplitude and phase resulting from material anelasticity. This advancement provides a significant step towards the practical usage of the time-fractional wave equation in seismic inversion.

Show Abstract

A Gentle Introduction to Gradient-Based Optimization and Variational Inequalities for Machine Learning

N. Wadia, Yatin Dandi, Michael I. Jordan

The rapid progress in machine learning in recent years has been based on a highly productive connection to gradient-based optimization. Further progress hinges in part on a shift in focus from pattern recognition to decision-making and multi-agent problems. In these broader settings, new mathematical challenges emerge that involve equilibria and game theory instead of optima. Gradient-based methods remain essential -- given the high dimensionality and large scale of machine-learning problems -- but simple gradient descent is no longer the point of departure for algorithm design. We provide a gentle introduction to a broader framework for gradient-based algorithms in machine learning, beginning with saddle points and monotone games, and proceeding to general variational inequalities. While we provide convergence proofs for several of the algorithms that we present, our main focus is that of providing motivation and intuition.

Show Abstract