443 Publications

Uniform approximation of common Gaussian process kernels using equispaced Fourier grids

A. Barnett, Philip Greengard, Ph.D., M. Rachh

The high efficiency of a recently proposed method for computing with Gaussian processes relies on expanding a (translationally invariant) covariance kernel into complex exponentials, with frequencies lying on a Cartesian equispaced grid. Here we provide rigorous error bounds for this approximation for two popular kernels—Matérn and squared exponential—in terms of the grid spacing and size. The kernel error bounds are uniform over a hypercube centered at the origin. Our tools include a split into aliasing and truncation errors, and bounds on sums of Gaussians or modified Bessel functions over various lattices. For the Matérn case, motivated by numerical study, we conjecture a stronger Frobenius-norm bound on the covariance matrix error for randomly-distributed data points. Lastly, we prove bounds on, and study numerically, the ill-conditioning of the linear systems arising in such regression problems.

Show Abstract

Directional Smoothness and Gradient Methods: Convergence and Adaptivity

Aaron Mishkin, Ahmed Khaled, Yuanhao Wang, Aaron Defazio, R. M. Gower

We develop new sub-optimality bounds for gradient descent (GD) that depend on the conditioning of the objective along the path of optimization, rather than on global, worst-case constants. Key to our proofs is directional smoothness, a measure of gradient variation that we use to develop upper-bounds on the objective. Minimizing these upper-bounds requires solving implicit equations to obtain a sequence of strongly adapted step-sizes; we show that these equations are straightforward to solve for convex quadratics and lead to new guarantees for two classical step-sizes. For general functions, we prove that the Polyak step-size and normalized GD obtain fast, path-dependent rates despite using no knowledge of the directional smoothness. Experiments on logistic regression show our convergence guarantees are tighter than the classical theory based on -smoothness.

Show Abstract

Learning High-Dimensional McKean–Vlasov Forward-Backward Stochastic Differential Equations with General Distribution Dependence

J. Han, Ruimeng Hu, Jihao Long

One of the core problems in mean-field control and mean-field games is to solve the corresponding McKean–Vlasov forward-backward stochastic differential equations (MV-FBSDEs). Most existing methods are tailored to special cases in which the mean-field interaction only depends on expectation or other moments and thus are inadequate to solve problems when the mean-field interaction has full distribution dependence. In this paper, we propose a novel deep learning method for computing MV-FBSDEs with a general form of mean-field interactions. Specifically, built on fictitious play, we recast the problem into repeatedly solving standard FBSDEs with explicit coefficient functions. These coefficient functions are used to approximate the MV-FBSDEs’ model coefficients with full distribution dependence, and are updated by solving another supervising learning problem using training data simulated from the last iteration’s FBSDE solutions. We use deep neural networks to solve standard BSDEs and approximate coefficient functions in order to solve high-dimensional MV-FBSDEs. Under proper assumptions on the learned functions, we prove that the convergence of the proposed method is free of the curse of dimensionality (CoD) by using a class of integral probability metrics previously developed in [J. Han, R. Hu, and J. Long, Stochastic Process. Appl., 164 (2023), pp. 242–287]. The proved theorem shows the advantage of the method in high dimensions. We present the numerical performance in high-dimensional MV-FBSDE problems, including a mean-field game example of the well-known Cucker–Smale model, the cost of which depends on the full distribution of the forward process.

Show Abstract

Stochastic Optimal Control Matching

Carles Domingo-Enrich, J. Han, Brandon Amos, Joan Bruna, Ricky T. Q. Chen

Stochastic optimal control, which has the goal of driving the behavior of noisy systems, is broadly applicable in science, engineering and artificial intelligence. Our work introduces Stochastic Optimal Control Matching (SOCM), a novel Iterative Diffusion Optimization (IDO) technique for stochastic optimal control that stems from the same philosophy as the conditional score matching loss for diffusion models. That is, the control is learned via a least squares problem by trying to fit a matching vector field. The training loss, which is closely connected to the cross-entropy loss, is optimized with respect to both the control function and a family of reparameterization matrices which appear in the matching vector f ield. The optimization with respect to the reparameterization matrices aims at minimizing the variance of the matching vector field. Experimentally, our algorithm achieves lower error than all the existing IDO techniques for stochastic optimal control for three out of four control problems, in some cases by an order of magnitude. The key idea underlying SOCM is the path-wise reparameterization trick, a novel technique that may be of independent interest.

Show Abstract

A high-order fast direct solver for surface PDEs

We introduce a fast direct solver for variable-coefficient elliptic PDEs on surfaces based on the hierarchical Poincaré–Steklov method. The method takes as input an unstructured, high-order quadrilateral mesh of a surface and discretizes surface differential operators on each element using a high-order spectral collocation scheme. Elemental solution operators and Dirichlet-to-Neumann maps tangent to the surface are precomputed and merged in a pairwise fashion to yield a hierarchy of solution operators that may be applied in \(\mathcal{O}(N \log N)\) operations for a mesh with \(N\) degrees of freedom. The resulting fast direct solver may be used to accelerate high-order implicit time-stepping schemes, as the precomputed operators can be reused for fast elliptic solves on surfaces. On a standard laptop, precomputation for a 12th-order surface mesh with over 1 million degrees of freedom takes 10 seconds, while subsequent solves take only 0.25 seconds. We apply the method to a range of problems on both smooth surfaces and surfaces with sharp corners and edges, including the static Laplace–Beltrami problem, the Hodge decomposition of a tangential vector field, and some time-dependent nonlinear reaction-diffusion systems. Reproducibility of computational results. This paper has been awarded the “SIAM Reproducibility Badge: code and data available”, as a recognition that the authors have followed reproducibility principles valued by SISC and the scientific computing community. Code and data that allow readers to reproduce the results in this paper are available at https://github.com/danfortunato/surface-hps-sisc.

Show Abstract

Explainable Equivariant Neural Networks for Particle Physics: PELICAN

A. Bogatskii, Timothy Hoffman, David W. Miller, Jan T. Offermann, Xiaoyang Liu

PELICAN is a novel permutation equivariant and Lorentz invariant or covariant aggregator network designed to overcome common limitations found in architectures applied to particle physics problems. Compared to many approaches that use non-specialized architectures that neglect underlying physics principles and require very large numbers of parameters, PELICAN employs a fundamentally symmetry group-based architecture that demonstrates benefits in terms of reduced complexity, increased interpretability, and raw performance. We present a comprehensive study of the PELICAN algorithm architecture in the context of both tagging (classification) and reconstructing (regression) Lorentz-boosted top quarks, including the difficult task of specifically identifying and measuring the $W$-boson inside the dense environment of the Lorentz-boosted top-quark hadronic final state. We also extend the application of PELICAN to the tasks of identifying quark-initiated vs.~gluon-initiated jets, and a multi-class identification across five separate target categories of jets. When tested on the standard task of Lorentz-boosted top-quark tagging, PELICAN outperforms existing competitors with much lower model complexity and high sample efficiency. On the less common and more complex task of 4-momentum regression, PELICAN also outperforms hand-crafted, non-machine learning algorithms. We discuss the implications of symmetry-restricted architectures for the wider field of machine learning for physics.

Show Abstract

Provable convergence guarantees for black-box variational inference

Justin Domke, R. M. Gower, Guillaume Garrigos

Black-box variational inference is widely used in situations where there is no proof that its stochastic optimization succeeds. We suggest this is due to a theoretical gap in existing stochastic optimization proofs—namely the challenge of gradient estimators with unusual noise bounds, and a composite non-smooth objective. For dense Gaussian variational families, we observe that existing gradient estimators based on reparameterization satisfy a quadratic noise bound and give novel convergence guarantees for proximal and projected stochastic gradient descent using this bound. This provides rigorous guarantees that methods similar to those used in practice converge on realistic inference problems.

Show Abstract

Sharp error estimates for target measure diffusion maps with applications to the committor problem

Shashank Sule, L. Evans, Maria Cameron

We obtain asymptotically sharp error estimates for the consistency error of the Target Measure Diffusion map (TMDmap) (Banisch et al. 2020), a variant of diffusion maps featuring importance sampling and hence allowing input data drawn from an arbitrary density. The derived error estimates include the bias error and the variance error. The resulting convergence rates are consistent with the approximation theory of graph Laplacians. The key novelty of our results lies in the explicit quantification of all the prefactors on leading-order terms. We also prove an error estimate for solutions of Dirichlet BVPs obtained using TMDmap, showing that the solution error is controlled by consistency error. We use these results to study an important application of TMDmap in the analysis of rare events in systems governed by overdamped Langevin dynamics using the framework of transition path theory (TPT). The cornerstone ingredient of TPT is the solution of the committor problem, a boundary value problem for the backward Kolmogorov PDE. Remarkably, we find that the TMDmap algorithm is particularly suited as a meshless solver to the committor problem due to the cancellation of several error terms in the prefactor formula. Furthermore, significant improvements in bias and variance errors occur when using a quasi-uniform sampling density. Our numerical experiments show that these improvements in accuracy are realizable in practice when using $\delta$-nets as spatially uniform inputs to the TMDmap algorithm.

Show Abstract

Improving Convergence and Generalization Using Parameter Symmetries

Bo Zhao, R. M. Gower, Robin Walters, Rose Yu

In overparametrized models, different parameter values may result in the same loss. Parameter space symmetries are loss-invariant transformations that change the model parameters. Teleportation applies such transformations to accelerate optimization. However, the exact mechanism behind this algorithm's success is not well understood. In this paper, we prove that teleportation gives overall faster time to convergence. Additionally, teleporting to minima with different curvatures improves generalization, which suggests a connection between the curvature of the minima and generalization ability. Finally, we show that integrating teleportation into optimization-based meta-learning improves convergence over traditional algorithms that perform only local updates. Our results showcase the versatility of teleportation and demonstrate the potential of incorporating symmetry in optimization.

Show Abstract

Microscopic Theory, Analysis, and Interpretation of Conductance Histograms in Molecular Junctions

Leopoldo Mejía, P. Cossio, Ignacio Franco

Molecular electronics break-junction experiments are widely used to investigate fundamental physics and chemistry at the nanoscale. Reproducibility in these experiments relies on measuring conductance on thousands of freshly formed molecular junctions, yielding a broad histogram of conductance events. Experiments typically focus on the most probable conductance, while the information content of the conductance histogram has remained unclear. Here we develop a microscopic theory for the conductance histogram by merging the theory of force-spectroscopy with molecular conductance. The procedure yields analytical equations that accurately fit the conductance histogram of a wide range of molecular junctions and augments the information content that can be extracted from them. Our formulation captures contributions to the conductance dispersion due to conductance changes during the mechanical elongation inherent to the experiments. In turn, the histogram shape is determined by the non-equilibrium stochastic features of junction rupture and formation. The microscopic parameters in the theory capture the junction’s electromechanical properties and can be isolated from separate conductance and rupture force (or junction-lifetime) measurements. The predicted behavior can be used to test the range of validity of the theory, understand the conductance histograms, design molecular junction experiments with enhanced resolution and molecular devices with more reproducible conductance properties.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates