421 Publications

Improved statistical and computational complexity of the mean-field Langevin dynamics under structured data

Atsushi Nitanda, Kazusato Oko, Taiji Suzuki, D. Wu

Recent works have shown that neural networks optimized by gradient-based methods can adapt to sparse or low-dimensional target functions through feature learning; an often studied target is the sparse parity function on the unit hypercube. However, such isotropic data setting does not capture the anisotropy and low intrinsic dimensionality exhibited in realistic datasets. In this work, we address this shortcoming by studying how gradient-based feature learning interacts with structured (anisotropic) input data: we consider the classification of -sparse parity on high-dimensional orthotope where the feature coordinates have varying magnitudes, and analyze the learning complexity of the mean-field Langevin dynamics (MFLD), which describes the noisy gradient descent update on two-layer neural network. We show that the statistical complexity (i.e. sample size) and computational complexity (i.e. network width) of MFLD can both be improved when prominent directions of the anisotropic input data align with the support of the target function. Moreover, by employing a coordinate transform determined by the gradient covariance, the width can be made independent of the target degree. Lastly, we demonstrate the benefit of feature learning by establishing a kernel lower bound on the classification error, which applies to neural networks in the lazy regime.

Show Abstract

Efficient convergent boundary integral methods for slender bodies

The interaction of fibers in a viscous (Stokes) fluid plays a crucial role in industrial and biological processes, such as sedimentation, rheology, transport, cell division, and locomotion. Numerical simulations generally rely on slender body theory (SBT), an asymptotic, nonconvergent approximation whose error blows up as fibers approach each other. Yet convergent boundary integral equation (BIE) methods which completely resolve the fiber surface have so far been impractical due to the prohibitive cost of layer-potential quadratures in such high aspect-ratio 3D geometries. We present a high-order Nyström quadrature scheme with aspect-ratio independent cost, making such BIEs practical. It combines centerline panels (each with a small number of poloidal Fourier modes), toroidal Green's functions, generalized Chebyshev quadratures, HPC parallel implementation, and FMM acceleration. We also present new BIE formulations for slender bodies that lead to well conditioned linear systems upon discretization. We test Laplace and Stokes Dirichlet problems, and Stokes mobility problems, for slender rigid closed fibers with (possibly varying) circular cross-section, at separations down to 1/20 of the slender radius, reporting convergence typically to at least 10 digits. We use this to quantify the breakdown of numerical SBT for close-to-touching rigid fibers. We also apply the methods to time-step the sedimentation of 512 loops with up to 1.65 million unknowns at around 7 digits of accuracy.

Show Abstract

Galaxy clustering analysis with SimBIG and the wavelet scattering transform

B. Régaldo-Saint Blancard, ChangHoon Hahn, Shirley Ho, Jiamin Hou, Pablo Lemos, Elena Massara , C. Modi, Azadeh Moradinezhad Dizgah, Liam Parker, Y. Yao, M. Eickenberg

The non-Gaussian spatial distribution of galaxies traces the large-scale structure of the Universe and therefore constitutes a prime observable to constrain cosmological parameters. We conduct Bayesian inference of the Λ CDM parameters Ωm, Ωb, h , ns, and σ8 from the Baryon Oscillation Spectroscopic Survey CMASS galaxy sample by combining the wavelet scattering transform (WST) with a simulation-based inference approach enabled by the SimBIG forward model. We design a set of reduced WST statistics that leverage symmetries of redshift-space data. Posterior distributions are estimated with a conditional normalizing flow trained on 20,000 simulated SimBIG galaxy catalogs with survey realism. We assess the accuracy of the posterior estimates using simulation-based calibration and quantify generalization and robustness to the change of forward model using a suite of 2000 test simulations. When probing scales down to kmax=0.5 h /Mpc , we are able to derive accurate posterior estimates that are robust to the change of forward model for all parameters, except σ8. We mitigate the robustness issues with σ8 by removing the WST coefficients that probe scales smaller than k ∼0.3 h /Mpc . Applied to the Baryon Oscillation Spectroscopic Survey CMASS sample, our WST analysis yields seemingly improved constraints obtained from a standard perturbation-theory-based power spectrum analysis with kmax=0.25 h /Mpc for all parameters except h . However, we still raise concerns on these results. The observational predictions significantly vary across different normalizing flow architectures, which we interpret as a form of model misspecification. This highlights a key challenge for forward modeling approaches when using summary statistics that are sensitive to detailed model-specific or observational imprints on galaxy clustering.

Show Abstract

Escaping saddle points efficiently with occupation-time-adapted perturbation

Xin Guo , J. Han, Mahan Tajrobehkar , Wenpin Tang

Motivated by the super-diffusivity of self-repelling random walk, which has roots in statistical physics, this paper develops a new perturbation mechanism for optimization algorithms. In this mechanism, perturbations are adapted to the history of states via the notion of occupation time. After integrating this mechanism into the framework of perturbed gradient descent (PGD) and perturbed accelerated gradient descent (PAGD), two new algorithms are proposed: perturbed gradient descent adapted to occupation time (PGDOT) and its accelerated version (PAGDOT). PGDOT and PAGDOT are guaranteed to avoid getting stuck at non-degenerate saddle points, and are shown to converge to second-order stationary points at least as fast as PGD and PAGD, respectively. The theoretical analysis is corroborated by empirical studies in which the new algorithms consistently escape saddle points and outperform not only their counterparts, PGD and PAGD, but also other popular alternatives including stochastic gradient descent, Adam, and several state-of-the-art adaptive gradient methods.

Show Abstract

Statistical Component Separation for Targeted Signal Recovery in Noisy Mixtures

B. Régaldo-Saint Blancard, M. Eickenberg

Separating signals from an additive mixture may be an unnecessarily hard problem when one is only interested in specific properties of a given signal. In this work, we tackle simpler "statistical component separation" problems that focus on recovering a predefined set of statistical descriptors of a target signal from a noisy mixture. Assuming access to samples of the noise process, we investigate a method devised to match the statistics of the solution candidate corrupted by noise samples with those of the observed mixture. We first analyze the behavior of this method using simple examples with analytically tractable calculations. Then, we apply it in an image denoising context employing 1) wavelet-based descriptors, 2) ConvNet-based descriptors on astrophysics and ImageNet data. In the case of 1), we show that our method better recovers the descriptors of the target data than a standard denoising method in most situations. Additionally, despite not constructed for this purpose, it performs surprisingly well in terms of peak signal-to-noise ratio on full signal reconstruction. In comparison, representation 2) appears less suitable for image denoising. Finally, we extend this method by introducing a diffusive stepwise algorithm which gives a new perspective to the initial method and leads to promising results for image denoising under specific circumstances.

Show Abstract

Nonlinear spiked covariance matrices and signal propagation in deep neural networks

Zhichao Wang, D. Wu, Zhou Fan

Many recent works have studied the eigenvalue spectrum of the Conjugate Kernel (CK) def ined by the nonlinear feature map of a feedforward neural network. However, existing results only establish weak convergence of the empirical eigenvalue distribution, and fall short of providing precise quantitative characterizations of the “spike” eigenvalues and eigenvectors that often capture the low-dimensional signal structure of the learning problem. In this work, we characterize these signal eigenvalues and eigenvectors for a nonlinear version of the spiked covariance model, including the CK as a special case. Using this general result, we give a quantitative description of how spiked eigenstructure in the input data propagates through the hidden layers of a neural network with random weights. As a second application, we study a simple regime of representation learning where the weight matrix develops a rank-one signal component over training and characterize the alignment of the target function with the spike eigenvector of the CK on test data.

Show Abstract

For how many iterations should we run Markov chain Monte Carlo?

C. Margossian, Andrew Gelman

Standard Markov chain Monte Carlo (MCMC) admits three fundamental control parameters: the number of chains, the length of the warmup phase, and the length of the sampling phase. These control parameters play a large role in determining the amount of computation we deploy. In practice, we need to walk a line between achieving sufficient precision and not wasting precious computational resources and time. We review general strategies to check the length of the warmup and sampling phases, and examine the three control parameters of MCMC in the contexts of CPU- and GPU-based hardware. Our discussion centers around three tasks: (1) inference about a latent variable, (2) computation of expectation values and quantiles, and (3) diagnostics to check the reliability of the estimators. This chapter begins with general recommendations on the control parameters of MCMC, which have been battle-tested over the years and often motivate defaults in Bayesian statistical software. Usually we do not know ahead of time how a sampler will interact with a target distribution, and so the choice of MCMC algorithm and its control parameters, tend to be based on experience, re-evaluated after simulations have been obtained and analyzed. The second part of this chapter provides a theoretical motivation for our recommended approach, with pointers to some concerns and open problems. We also examine recent developments on the algorithmic and hardware fronts, which motivate new computational approaches to MCMC.

Show Abstract

Design of Perfectly Conducting Objects That Are Invisible to an Incident Plane Wave

Johan Helsing, S. Jiang, Anders Karlsson

This work concerns the design of perfectly conducting objects that are invisible to an incident transverse magnetic plane wave. The object in question is a finite planar waveguide with a finite periodic array of barriers. By optimizing this array, the amplitude of the scattered field is reduced to less than 10−9 times the amplitude of the incident plane wave everywhere outside the waveguide. To accurately evaluate such minute amplitudes, we employ a recently developed boundary integral equation technique, adapted for objects whose boundaries have endpoints, corners, and branch points.

Show Abstract

Deep reinforcement learning in finite-horizon to explore the most probable transition pathway

Jin Guo, Ting Gao, Peng Zhang, J. Han, Jinqiao Duan

In many scientific and engineering problems, noise and nonlinearity are unavoidable, which could induce interesting mathematical problem such as transition phenomena. This paper focuses on efficiently discovering the most probable transition pathway for stochastic dynamical systems employing reinforcement learning. With the Onsager–Machlup action functional theory to quantify rare events in stochastic dynamical systems, finding the most probable pathway is equivalent to solving a variational problem on the action functional. When the action function cannot be explicitly expressed by paths near the reference orbit, the variational problem needs to be converted into an optimal control problem. First, by integrating terminal prediction into the reinforcement learning framework, we develop a Terminal Prediction Deep Deterministic Policy Gradient (TP-DDPG) algorithm to deal with the finite-horizon optimal control issue in a forward way. Next, we present the convergence analysis of our algorithm for the value function in terms of the neural network’s approximation error and estimation error. Finally, we conduct various experiments in different dimensions for the transition problems in applications to illustrate the effectiveness of our algorithm.

Show Abstract

Envelopes of Horospheres and Weingarten Surfaces in Hyperbolic 3-Space

We derive basic differential geometric formulae for surfaces in hyperbolic space represented as envelopes of horospheres. The dual notion of parallel hypersurfaces is also studied. The representation is applied to prove existence and regularity theorems for Weingarten surfaces in H^3, which satisfy (1-a)K = a(2-H), for an a < 0, and have a specified boundary curve at infinity. These surfaces are shown to be closely connected to conformal mappings of domains in S^2 into the unit disk and provide Riemannian interpretations for some conformal invariants associated to such mappings.
This paper was originally written in 1984, before I learned to use TeX, and was typed by one of the secretaries in the Princeton Math Department. It was more or less, my first original work after my dissertation. For some reason, I was not able to get this paper published in a timely manner. The results and perspective in this paper have proved to be useful to a variety of people, some of whom asked me to render the article into TeX and post it to the arXiv. I had been seriously thinking about doing this, when Martin Bridgemen sent me a transcription of my original article into TeX. I am extremely grateful to him for the effort he has put into this project.
The paper is now formatted in a more or less modern AMS-article style, but for lots of additional punctuation, a few corrections and some minor stylistic changes, the content has been largely reproduced as it originally was. Remarks about the 'state-of-the-art' in hyperbolic geometry are obviously way out of date, as there has been enormous progress in many aspects of this still rich subject.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates