428 Publications

Should Under-parameterized Student Networks Copy or Average Teacher Weights?

B. Şimşek, Amire Bendjeddou, Wulfram Gerstner, Johanni Brea

Any continuous function f∗ can be approximated arbitrarily well by a neural network with sufficiently many neurons k. We consider the case when f∗ itself is a neural network with one hidden layer and k neurons. Approximating f∗ with a neural network with n < k neurons can thus be seen as fitting an under-parameterized “student” network with n neurons to a “teacher” network with k neurons. As the student has fewer neurons than the teacher, it is unclear, whether each of the n student neurons should copy one of the teacher neurons or rather average a group of teacher neurons. For shallow neural networks with erf activation function and for the standard Gaussian input distribution, we prove that “copy-average” configurations are critical points if the teacher’s incoming vectors are orthonormal and its outgoing weights are unitary. Moreover, the optimum among such configurations is reached when n − 1 student neurons each copy one teacher neuron and the n-th student neuron averages the remaining k − n + 1 teacher neurons. For the student network with n = 1 neuron, we provide additionally a closed-form solution of the non-trivial critical point(s) for commonly used activation functions through solving an equivalent constrained optimization problem. Empirically, we find for the erf activation function that gradient flow converges either to the optimal copy-average critical point or to another point where each student neuron approximately copies a different teacher neuron. Finally, we find similar results for the ReLU activation function, suggesting that the optimal solution of underparameterized networks has a universal structure.

Show Abstract

Solving the Scattering Problem for Open Wave-Guides, III: Radiation Conditions and Uniqueness

C. Epstein, Rafe Mazzeo

This paper continues the analysis of the scattering problem for a network of open wave-guides started in [arXiv:2302.04353, arXiv:2310.05816]. In this part we present explicit, physically motivated radiation conditions that ensure uniqueness of the solution to the scattering problem. These conditions stem from a 2000 paper of A. Vasy on 3-body Schrodinger operators; we discuss closely related conditions from a 1994 paper of H. Isozaki. Vasy's paper also proves the existence of the limiting absorption resolvents, and that the limiting solutions satisfy the radiation conditions. The statements of these results require a calculus of pseudodifferential operators, called the 3-body scattering calculus, which is briefly introduced here. We show that the solutions to the model problem obtained in arXiv:2302.04353 satisfy these radiation conditions, which makes it possible to prove uniqueness, and therefore existence, for the system of Fredholm integral equations introduced in that paper.

Show Abstract

Implicit Adaptive Mesh Refinement for Dispersive Tsunami Propagation

M. Berger, Randall J. LeVeque

We present an algorithm to solve the dispersive depth-averaged Serre–Green–Naghdi equations using patch-based adaptive mesh refinement. These equations require adding additional higher derivative terms to the nonlinear shallow water equations. This has been implemented as a new component of the open source GeoClaw software that is widely used for modeling tsunamis, storm surge, and related hazards, improving its accuracy on shorter wavelength phenomena. We use a formulation that requires solving an elliptic system of equations at each time step, making the method implicit. The adaptive algorithm allows different time steps on different refinement levels and solves the implicit equations level by level. Computational examples are presented to illustrate the stability and accuracy on a radially symmetric test case and two realistic tsunami modeling problems, including a hypothetical asteroid impact creating a short wavelength tsunami for which dispersive terms are necessary. Reproducibility of computational results. This paper has been awarded the “SIAM Reproducibility Badge: Code and data available” as a recognition that the authors have followed reproducibility principles valued by SISC and the scientific computing community. Code and data that allow readers to reproduce the results in this paper are available at https://github.com/rjleveque/ImplicitAMR-paper and in the supplementary materials (ImplicitAMR-paper.zip [174KB]).

Show Abstract

A new version of the adaptive fast Gauss transform for discrete and continuous sources

L. Greengard, S. Jiang, M. Rachh, J. Wang

We present a new version of the fast Gauss transform (FGT) for discrete and continuous sources. Classical Hermite expansions are avoided entirely, making use only of the plane-wave representation of the Gaussian kernel and a new hierarchical merging scheme. For continuous source distributions sampled on adaptive tensor-product grids, we exploit the separable structure of the Gaussian kernel to accelerate the computation. For discrete sources, the scheme relies on the nonuniform fast Fourier transform (NUFFT) to construct near field plane wave representations. The scheme has been implemented for either free-space or periodic boundary conditions. In many regimes, the speed is comparable to or better than that of the conventional FFT in work per gridpoint, despite being fully adaptive.

Show Abstract

A new provably stable weighted state redistribution algorithm

We propose a practical finite volume method on cut cells using state redistribution. Our algorithm is provably monotone, total variation diminishing, and GKS (Gustafsson, Kreiss, Sundström) stable in many situations, and shuts off continuously as the cut cell size approaches a target value. Our analysis reveals why original state redistribution works so well: it results in a monotone scheme for most configurations, though at times subject to a slightly smaller CFL condition. Our analysis also explains why a premerging step is beneficial. We show computational experiments in two and three dimensions.

Show Abstract

Uniform approximation of common Gaussian process kernels using equispaced Fourier grids

A. Barnett, Philip Greengard, Ph.D., M. Rachh

The high efficiency of a recently proposed method for computing with Gaussian processes relies on expanding a (translationally invariant) covariance kernel into complex exponentials, with frequencies lying on a Cartesian equispaced grid. Here we provide rigorous error bounds for this approximation for two popular kernels—Matérn and squared exponential—in terms of the grid spacing and size. The kernel error bounds are uniform over a hypercube centered at the origin. Our tools include a split into aliasing and truncation errors, and bounds on sums of Gaussians or modified Bessel functions over various lattices. For the Matérn case, motivated by numerical study, we conjecture a stronger Frobenius-norm bound on the covariance matrix error for randomly-distributed data points. Lastly, we prove bounds on, and study numerically, the ill-conditioning of the linear systems arising in such regression problems.

Show Abstract

Directional Smoothness and Gradient Methods: Convergence and Adaptivity

Aaron Mishkin, Ahmed Khaled, Yuanhao Wang, Aaron Defazio, R. M. Gower

We develop new sub-optimality bounds for gradient descent (GD) that depend on the conditioning of the objective along the path of optimization, rather than on global, worst-case constants. Key to our proofs is directional smoothness, a measure of gradient variation that we use to develop upper-bounds on the objective. Minimizing these upper-bounds requires solving implicit equations to obtain a sequence of strongly adapted step-sizes; we show that these equations are straightforward to solve for convex quadratics and lead to new guarantees for two classical step-sizes. For general functions, we prove that the Polyak step-size and normalized GD obtain fast, path-dependent rates despite using no knowledge of the directional smoothness. Experiments on logistic regression show our convergence guarantees are tighter than the classical theory based on -smoothness.

Show Abstract

Learning High-Dimensional McKean–Vlasov Forward-Backward Stochastic Differential Equations with General Distribution Dependence

J. Han, Ruimeng Hu, Jihao Long

One of the core problems in mean-field control and mean-field games is to solve the corresponding McKean–Vlasov forward-backward stochastic differential equations (MV-FBSDEs). Most existing methods are tailored to special cases in which the mean-field interaction only depends on expectation or other moments and thus are inadequate to solve problems when the mean-field interaction has full distribution dependence. In this paper, we propose a novel deep learning method for computing MV-FBSDEs with a general form of mean-field interactions. Specifically, built on fictitious play, we recast the problem into repeatedly solving standard FBSDEs with explicit coefficient functions. These coefficient functions are used to approximate the MV-FBSDEs’ model coefficients with full distribution dependence, and are updated by solving another supervising learning problem using training data simulated from the last iteration’s FBSDE solutions. We use deep neural networks to solve standard BSDEs and approximate coefficient functions in order to solve high-dimensional MV-FBSDEs. Under proper assumptions on the learned functions, we prove that the convergence of the proposed method is free of the curse of dimensionality (CoD) by using a class of integral probability metrics previously developed in [J. Han, R. Hu, and J. Long, Stochastic Process. Appl., 164 (2023), pp. 242–287]. The proved theorem shows the advantage of the method in high dimensions. We present the numerical performance in high-dimensional MV-FBSDE problems, including a mean-field game example of the well-known Cucker–Smale model, the cost of which depends on the full distribution of the forward process.

Show Abstract

Stochastic Optimal Control Matching

Carles Domingo-Enrich, J. Han, Brandon Amos, Joan Bruna, Ricky T. Q. Chen

Stochastic optimal control, which has the goal of driving the behavior of noisy systems, is broadly applicable in science, engineering and artificial intelligence. Our work introduces Stochastic Optimal Control Matching (SOCM), a novel Iterative Diffusion Optimization (IDO) technique for stochastic optimal control that stems from the same philosophy as the conditional score matching loss for diffusion models. That is, the control is learned via a least squares problem by trying to fit a matching vector field. The training loss, which is closely connected to the cross-entropy loss, is optimized with respect to both the control function and a family of reparameterization matrices which appear in the matching vector f ield. The optimization with respect to the reparameterization matrices aims at minimizing the variance of the matching vector field. Experimentally, our algorithm achieves lower error than all the existing IDO techniques for stochastic optimal control for three out of four control problems, in some cases by an order of magnitude. The key idea underlying SOCM is the path-wise reparameterization trick, a novel technique that may be of independent interest.

Show Abstract

A high-order fast direct solver for surface PDEs

We introduce a fast direct solver for variable-coefficient elliptic PDEs on surfaces based on the hierarchical Poincaré–Steklov method. The method takes as input an unstructured, high-order quadrilateral mesh of a surface and discretizes surface differential operators on each element using a high-order spectral collocation scheme. Elemental solution operators and Dirichlet-to-Neumann maps tangent to the surface are precomputed and merged in a pairwise fashion to yield a hierarchy of solution operators that may be applied in \(\mathcal{O}(N \log N)\) operations for a mesh with \(N\) degrees of freedom. The resulting fast direct solver may be used to accelerate high-order implicit time-stepping schemes, as the precomputed operators can be reused for fast elliptic solves on surfaces. On a standard laptop, precomputation for a 12th-order surface mesh with over 1 million degrees of freedom takes 10 seconds, while subsequent solves take only 0.25 seconds. We apply the method to a range of problems on both smooth surfaces and surfaces with sharp corners and edges, including the static Laplace–Beltrami problem, the Hodge decomposition of a tangential vector field, and some time-dependent nonlinear reaction-diffusion systems. Reproducibility of computational results. This paper has been awarded the “SIAM Reproducibility Badge: code and data available”, as a recognition that the authors have followed reproducibility principles valued by SISC and the scientific computing community. Code and data that allow readers to reproduce the results in this paper are available at https://github.com/danfortunato/surface-hps-sisc.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates