443 Publications

Simulation-Based Stacking

Yuling Yao , B. Régaldo-Saint Blancard, Justin Domke

Simulation-based inference has been popular for amortized Bayesian computation. It is typical to have more than one posterior approximation, from different inference algorithms, different architectures, or simply the randomness of initialization and stochastic gradients. With a consistency guarantee, we present a general posterior stacking framework to make use of all available approximations. Our stacking method is able to combine densities, simulation draws, conf idence intervals, and moments, and address the overall precision, calibration, coverage, and bias of the posterior approximation at the same time. We illustrate our method on several benchmark simulations and a challenging cosmological inference task.

Show Abstract

GIST: Gibbs self-tuning for locally adaptive Hamiltonian Monte Carlo

N. Bou-Rabee, B. Carpenter, Milo Marsden

We introduce a novel and flexible framework for constructing locally adaptive Hamiltonian Monte Carlo (HMC) samplers by Gibbs sampling the algorithm's tuning parameters conditionally based on the position and momentum at each step. For adaptively sampling path lengths, this framework -- which we call Gibbs self-tuning (GIST) -- encompasses randomized HMC, multinomial HMC, the No-U-Turn Sampler (NUTS), and the Apogee-to-Apogee Path Sampler as special cases. The GIST framework is illustrated with a novel alternative to NUTS for locally adapting path lengths, evaluated with an exact Hamiltonian for a high-dimensional, ill-conditioned Gaussian measure and with the leapfrog integrator for a suite of diverse models.

Show Abstract

Improved statistical and computational complexity of the mean-field Langevin dynamics under structured data

Atsushi Nitanda, Kazusato Oko, Taiji Suzuki, D. Wu

Recent works have shown that neural networks optimized by gradient-based methods can adapt to sparse or low-dimensional target functions through feature learning; an often studied target is the sparse parity function on the unit hypercube. However, such isotropic data setting does not capture the anisotropy and low intrinsic dimensionality exhibited in realistic datasets. In this work, we address this shortcoming by studying how gradient-based feature learning interacts with structured (anisotropic) input data: we consider the classification of -sparse parity on high-dimensional orthotope where the feature coordinates have varying magnitudes, and analyze the learning complexity of the mean-field Langevin dynamics (MFLD), which describes the noisy gradient descent update on two-layer neural network. We show that the statistical complexity (i.e. sample size) and computational complexity (i.e. network width) of MFLD can both be improved when prominent directions of the anisotropic input data align with the support of the target function. Moreover, by employing a coordinate transform determined by the gradient covariance, the width can be made independent of the target degree. Lastly, we demonstrate the benefit of feature learning by establishing a kernel lower bound on the classification error, which applies to neural networks in the lazy regime.

Show Abstract

Efficient convergent boundary integral methods for slender bodies

The interaction of fibers in a viscous (Stokes) fluid plays a crucial role in industrial and biological processes, such as sedimentation, rheology, transport, cell division, and locomotion. Numerical simulations generally rely on slender body theory (SBT), an asymptotic, nonconvergent approximation whose error blows up as fibers approach each other. Yet convergent boundary integral equation (BIE) methods which completely resolve the fiber surface have so far been impractical due to the prohibitive cost of layer-potential quadratures in such high aspect-ratio 3D geometries. We present a high-order Nyström quadrature scheme with aspect-ratio independent cost, making such BIEs practical. It combines centerline panels (each with a small number of poloidal Fourier modes), toroidal Green's functions, generalized Chebyshev quadratures, HPC parallel implementation, and FMM acceleration. We also present new BIE formulations for slender bodies that lead to well conditioned linear systems upon discretization. We test Laplace and Stokes Dirichlet problems, and Stokes mobility problems, for slender rigid closed fibers with (possibly varying) circular cross-section, at separations down to 1/20 of the slender radius, reporting convergence typically to at least 10 digits. We use this to quantify the breakdown of numerical SBT for close-to-touching rigid fibers. We also apply the methods to time-step the sedimentation of 512 loops with up to 1.65 million unknowns at around 7 digits of accuracy.

Show Abstract

Galaxy clustering analysis with SimBIG and the wavelet scattering transform

B. Régaldo-Saint Blancard, ChangHoon Hahn, Shirley Ho, Jiamin Hou, Pablo Lemos, Elena Massara , C. Modi, Azadeh Moradinezhad Dizgah, Liam Parker, Y. Yao, M. Eickenberg

The non-Gaussian spatial distribution of galaxies traces the large-scale structure of the Universe and therefore constitutes a prime observable to constrain cosmological parameters. We conduct Bayesian inference of the Λ CDM parameters Ωm, Ωb, h , ns, and σ8 from the Baryon Oscillation Spectroscopic Survey CMASS galaxy sample by combining the wavelet scattering transform (WST) with a simulation-based inference approach enabled by the SimBIG forward model. We design a set of reduced WST statistics that leverage symmetries of redshift-space data. Posterior distributions are estimated with a conditional normalizing flow trained on 20,000 simulated SimBIG galaxy catalogs with survey realism. We assess the accuracy of the posterior estimates using simulation-based calibration and quantify generalization and robustness to the change of forward model using a suite of 2000 test simulations. When probing scales down to kmax=0.5 h /Mpc , we are able to derive accurate posterior estimates that are robust to the change of forward model for all parameters, except σ8. We mitigate the robustness issues with σ8 by removing the WST coefficients that probe scales smaller than k ∼0.3 h /Mpc . Applied to the Baryon Oscillation Spectroscopic Survey CMASS sample, our WST analysis yields seemingly improved constraints obtained from a standard perturbation-theory-based power spectrum analysis with kmax=0.25 h /Mpc for all parameters except h . However, we still raise concerns on these results. The observational predictions significantly vary across different normalizing flow architectures, which we interpret as a form of model misspecification. This highlights a key challenge for forward modeling approaches when using summary statistics that are sensitive to detailed model-specific or observational imprints on galaxy clustering.

Show Abstract

Escaping saddle points efficiently with occupation-time-adapted perturbation

Xin Guo , J. Han, Mahan Tajrobehkar , Wenpin Tang

Motivated by the super-diffusivity of self-repelling random walk, which has roots in statistical physics, this paper develops a new perturbation mechanism for optimization algorithms. In this mechanism, perturbations are adapted to the history of states via the notion of occupation time. After integrating this mechanism into the framework of perturbed gradient descent (PGD) and perturbed accelerated gradient descent (PAGD), two new algorithms are proposed: perturbed gradient descent adapted to occupation time (PGDOT) and its accelerated version (PAGDOT). PGDOT and PAGDOT are guaranteed to avoid getting stuck at non-degenerate saddle points, and are shown to converge to second-order stationary points at least as fast as PGD and PAGD, respectively. The theoretical analysis is corroborated by empirical studies in which the new algorithms consistently escape saddle points and outperform not only their counterparts, PGD and PAGD, but also other popular alternatives including stochastic gradient descent, Adam, and several state-of-the-art adaptive gradient methods.

Show Abstract

Statistical Component Separation for Targeted Signal Recovery in Noisy Mixtures

B. Régaldo-Saint Blancard, M. Eickenberg

Separating signals from an additive mixture may be an unnecessarily hard problem when one is only interested in specific properties of a given signal. In this work, we tackle simpler "statistical component separation" problems that focus on recovering a predefined set of statistical descriptors of a target signal from a noisy mixture. Assuming access to samples of the noise process, we investigate a method devised to match the statistics of the solution candidate corrupted by noise samples with those of the observed mixture. We first analyze the behavior of this method using simple examples with analytically tractable calculations. Then, we apply it in an image denoising context employing 1) wavelet-based descriptors, 2) ConvNet-based descriptors on astrophysics and ImageNet data. In the case of 1), we show that our method better recovers the descriptors of the target data than a standard denoising method in most situations. Additionally, despite not constructed for this purpose, it performs surprisingly well in terms of peak signal-to-noise ratio on full signal reconstruction. In comparison, representation 2) appears less suitable for image denoising. Finally, we extend this method by introducing a diffusive stepwise algorithm which gives a new perspective to the initial method and leads to promising results for image denoising under specific circumstances.

Show Abstract

Nonlinear spiked covariance matrices and signal propagation in deep neural networks

Zhichao Wang, D. Wu, Zhou Fan

Many recent works have studied the eigenvalue spectrum of the Conjugate Kernel (CK) def ined by the nonlinear feature map of a feedforward neural network. However, existing results only establish weak convergence of the empirical eigenvalue distribution, and fall short of providing precise quantitative characterizations of the “spike” eigenvalues and eigenvectors that often capture the low-dimensional signal structure of the learning problem. In this work, we characterize these signal eigenvalues and eigenvectors for a nonlinear version of the spiked covariance model, including the CK as a special case. Using this general result, we give a quantitative description of how spiked eigenstructure in the input data propagates through the hidden layers of a neural network with random weights. As a second application, we study a simple regime of representation learning where the weight matrix develops a rank-one signal component over training and characterize the alignment of the target function with the spike eigenvector of the CK on test data.

Show Abstract

For how many iterations should we run Markov chain Monte Carlo?

C. Margossian, Andrew Gelman

Standard Markov chain Monte Carlo (MCMC) admits three fundamental control parameters: the number of chains, the length of the warmup phase, and the length of the sampling phase. These control parameters play a large role in determining the amount of computation we deploy. In practice, we need to walk a line between achieving sufficient precision and not wasting precious computational resources and time. We review general strategies to check the length of the warmup and sampling phases, and examine the three control parameters of MCMC in the contexts of CPU- and GPU-based hardware. Our discussion centers around three tasks: (1) inference about a latent variable, (2) computation of expectation values and quantiles, and (3) diagnostics to check the reliability of the estimators. This chapter begins with general recommendations on the control parameters of MCMC, which have been battle-tested over the years and often motivate defaults in Bayesian statistical software. Usually we do not know ahead of time how a sampler will interact with a target distribution, and so the choice of MCMC algorithm and its control parameters, tend to be based on experience, re-evaluated after simulations have been obtained and analyzed. The second part of this chapter provides a theoretical motivation for our recommended approach, with pointers to some concerns and open problems. We also examine recent developments on the algorithmic and hardware fronts, which motivate new computational approaches to MCMC.

Show Abstract

Design of Perfectly Conducting Objects That Are Invisible to an Incident Plane Wave

Johan Helsing, S. Jiang, Anders Karlsson

This work concerns the design of perfectly conducting objects that are invisible to an incident transverse magnetic plane wave. The object in question is a finite planar waveguide with a finite periodic array of barriers. By optimizing this array, the amplitude of the scattered field is reduced to less than 10−9 times the amplitude of the incident plane wave everywhere outside the waveguide. To accurately evaluate such minute amplitudes, we employ a recently developed boundary integral equation technique, adapted for objects whose boundaries have endpoints, corners, and branch points.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates