443 Publications

Deep reinforcement learning in finite-horizon to explore the most probable transition pathway

Jin Guo, Ting Gao, Peng Zhang, J. Han, Jinqiao Duan

In many scientific and engineering problems, noise and nonlinearity are unavoidable, which could induce interesting mathematical problem such as transition phenomena. This paper focuses on efficiently discovering the most probable transition pathway for stochastic dynamical systems employing reinforcement learning. With the Onsager–Machlup action functional theory to quantify rare events in stochastic dynamical systems, finding the most probable pathway is equivalent to solving a variational problem on the action functional. When the action function cannot be explicitly expressed by paths near the reference orbit, the variational problem needs to be converted into an optimal control problem. First, by integrating terminal prediction into the reinforcement learning framework, we develop a Terminal Prediction Deep Deterministic Policy Gradient (TP-DDPG) algorithm to deal with the finite-horizon optimal control issue in a forward way. Next, we present the convergence analysis of our algorithm for the value function in terms of the neural network’s approximation error and estimation error. Finally, we conduct various experiments in different dimensions for the transition problems in applications to illustrate the effectiveness of our algorithm.

Show Abstract

Envelopes of Horospheres and Weingarten Surfaces in Hyperbolic 3-Space

We derive basic differential geometric formulae for surfaces in hyperbolic space represented as envelopes of horospheres. The dual notion of parallel hypersurfaces is also studied. The representation is applied to prove existence and regularity theorems for Weingarten surfaces in H^3, which satisfy (1-a)K = a(2-H), for an a < 0, and have a specified boundary curve at infinity. These surfaces are shown to be closely connected to conformal mappings of domains in S^2 into the unit disk and provide Riemannian interpretations for some conformal invariants associated to such mappings.
This paper was originally written in 1984, before I learned to use TeX, and was typed by one of the secretaries in the Princeton Math Department. It was more or less, my first original work after my dissertation. For some reason, I was not able to get this paper published in a timely manner. The results and perspective in this paper have proved to be useful to a variety of people, some of whom asked me to render the article into TeX and post it to the arXiv. I had been seriously thinking about doing this, when Martin Bridgemen sent me a transcription of my original article into TeX. I am extremely grateful to him for the effort he has put into this project.
The paper is now formatted in a more or less modern AMS-article style, but for lots of additional punctuation, a few corrections and some minor stylistic changes, the content has been largely reproduced as it originally was. Remarks about the 'state-of-the-art' in hyperbolic geometry are obviously way out of date, as there has been enormous progress in many aspects of this still rich subject.

Show Abstract

The magnetic gradient scale length explains why certain plasmas require close external magnetic coils

John Kappel, Matt Landreman, D. Malhotra

The separation between the last closed flux surface of a plasma and the external coils that magnetically confine it is a limiting factor in the construction of fusion-capable plasma devices. This plasma-coil separation must be large enough so that components such as a breeding blanket and neutron shielding can fit between the plasma and the coils. Plasma-coil separation affects reactor size, engineering complexity, and particle loss due to field ripple. For some plasmas it can be difficult to produce the desired flux surface shaping with distant coils, and for other plasmas it is infeasible altogether. Here, we seek to understand the underlying physics that limits plasma-coil separation and explain why some configurations require close external coils. In this paper, we explore the hypothesis that the limiting plasma-coil separation is set by the shortest scale length of the magnetic field as expressed by the tensor. We tested this hypothesis on a database of 40 stellarator and tokamak configurations. Within this database, the coil-to-plasma distance compared to the minor radius varies by over an order of magnitude. The magnetic scale length is well correlated to the coil-to-plasma distance of actual coil designs generated using the REGCOIL method (Landreman 2017 Nucl. Fusion 57 046003). Additionally, this correlation reveals a general trend that larger plasma-coil separation is possible with a small number of field periods.

Show Abstract

Generalization in diffusion models arises from geometry-adaptive harmonic representations

Zahra Kadkhodaie, Florentin Guth, E. P. Simoncelli, S. Mallat

Deep neural networks (DNNs) trained for image denoising are able to generate high-quality samples with score-based reverse diffusion algorithms. These impressive capabilities seem to imply an escape from the curse of dimensionality, but recent reports of memorization of the training set raise the question of whether these networks are learning the "true" continuous density of the data. Here, we show that two DNNs trained on non-overlapping subsets of a dataset learn nearly the same score function, and thus the same density, when the number of training images is large enough. In this regime of strong generalization, diffusion-generated images are distinct from the training set, and are of high visual quality, suggesting that the inductive biases of the DNNs are well-aligned with the data density. We analyze the learned denoising functions and show that the inductive biases give rise to a shrinkage operation in a basis adapted to the underlying image. Examination of these bases reveals oscillating harmonic structures along contours and in homogeneous regions. We demonstrate that trained denoisers are inductively biased towards these geometry-adaptive harmonic bases since they arise not only when the network is trained on photographic images, but also when it is trained on image classes supported on low-dimensional manifolds for which the harmonic basis is suboptimal. Finally, we show that when trained on regular image classes for which the optimal basis is known to be geometry-adaptive and harmonic, the denoising performance of the networks is near-optimal.

Show Abstract

Scaling Laws for Associative Memories

Vivien Cabannes , Elvis Dohmatob, A. Bietti

Learning arguably involves the discovery and memorization of abstract rules. The aim of this paper is to study associative memory mechanisms. Our model is based on high-dimensional matrices consisting of outer products of embeddings, which relates to the inner layers of transformer language models. We derive precise scaling laws with respect to sample size and parameter size, and discuss the statistical efficiency of different estimators, including optimization-based algorithms. We provide extensive numerical experiments to validate and interpret theoretical results, including fine-grained visualizations of the stored memory associations.

Show Abstract

Should Under-parameterized Student Networks Copy or Average Teacher Weights?

B. Şimşek, Amire Bendjeddou, Wulfram Gerstner, Johanni Brea

Any continuous function f∗ can be approximated arbitrarily well by a neural network with sufficiently many neurons k. We consider the case when f∗ itself is a neural network with one hidden layer and k neurons. Approximating f∗ with a neural network with n < k neurons can thus be seen as fitting an under-parameterized “student” network with n neurons to a “teacher” network with k neurons. As the student has fewer neurons than the teacher, it is unclear, whether each of the n student neurons should copy one of the teacher neurons or rather average a group of teacher neurons. For shallow neural networks with erf activation function and for the standard Gaussian input distribution, we prove that “copy-average” configurations are critical points if the teacher’s incoming vectors are orthonormal and its outgoing weights are unitary. Moreover, the optimum among such configurations is reached when n − 1 student neurons each copy one teacher neuron and the n-th student neuron averages the remaining k − n + 1 teacher neurons. For the student network with n = 1 neuron, we provide additionally a closed-form solution of the non-trivial critical point(s) for commonly used activation functions through solving an equivalent constrained optimization problem. Empirically, we find for the erf activation function that gradient flow converges either to the optimal copy-average critical point or to another point where each student neuron approximately copies a different teacher neuron. Finally, we find similar results for the ReLU activation function, suggesting that the optimal solution of underparameterized networks has a universal structure.

Show Abstract

Solving the Scattering Problem for Open Wave-Guides, III: Radiation Conditions and Uniqueness

C. Epstein, Rafe Mazzeo

This paper continues the analysis of the scattering problem for a network of open wave-guides started in [arXiv:2302.04353, arXiv:2310.05816]. In this part we present explicit, physically motivated radiation conditions that ensure uniqueness of the solution to the scattering problem. These conditions stem from a 2000 paper of A. Vasy on 3-body Schrodinger operators; we discuss closely related conditions from a 1994 paper of H. Isozaki. Vasy's paper also proves the existence of the limiting absorption resolvents, and that the limiting solutions satisfy the radiation conditions. The statements of these results require a calculus of pseudodifferential operators, called the 3-body scattering calculus, which is briefly introduced here. We show that the solutions to the model problem obtained in arXiv:2302.04353 satisfy these radiation conditions, which makes it possible to prove uniqueness, and therefore existence, for the system of Fredholm integral equations introduced in that paper.

Show Abstract

Implicit Adaptive Mesh Refinement for Dispersive Tsunami Propagation

M. Berger, Randall J. LeVeque

We present an algorithm to solve the dispersive depth-averaged Serre–Green–Naghdi equations using patch-based adaptive mesh refinement. These equations require adding additional higher derivative terms to the nonlinear shallow water equations. This has been implemented as a new component of the open source GeoClaw software that is widely used for modeling tsunamis, storm surge, and related hazards, improving its accuracy on shorter wavelength phenomena. We use a formulation that requires solving an elliptic system of equations at each time step, making the method implicit. The adaptive algorithm allows different time steps on different refinement levels and solves the implicit equations level by level. Computational examples are presented to illustrate the stability and accuracy on a radially symmetric test case and two realistic tsunami modeling problems, including a hypothetical asteroid impact creating a short wavelength tsunami for which dispersive terms are necessary. Reproducibility of computational results. This paper has been awarded the “SIAM Reproducibility Badge: Code and data available” as a recognition that the authors have followed reproducibility principles valued by SISC and the scientific computing community. Code and data that allow readers to reproduce the results in this paper are available at https://github.com/rjleveque/ImplicitAMR-paper and in the supplementary materials (ImplicitAMR-paper.zip [174KB]).

Show Abstract

A new version of the adaptive fast Gauss transform for discrete and continuous sources

L. Greengard, S. Jiang, M. Rachh, J. Wang

We present a new version of the fast Gauss transform (FGT) for discrete and continuous sources. Classical Hermite expansions are avoided entirely, making use only of the plane-wave representation of the Gaussian kernel and a new hierarchical merging scheme. For continuous source distributions sampled on adaptive tensor-product grids, we exploit the separable structure of the Gaussian kernel to accelerate the computation. For discrete sources, the scheme relies on the nonuniform fast Fourier transform (NUFFT) to construct near field plane wave representations. The scheme has been implemented for either free-space or periodic boundary conditions. In many regimes, the speed is comparable to or better than that of the conventional FFT in work per gridpoint, despite being fully adaptive.

Show Abstract

A new provably stable weighted state redistribution algorithm

We propose a practical finite volume method on cut cells using state redistribution. Our algorithm is provably monotone, total variation diminishing, and GKS (Gustafsson, Kreiss, Sundström) stable in many situations, and shuts off continuously as the cut cell size approaches a target value. Our analysis reveals why original state redistribution works so well: it results in a monotone scheme for most configurations, though at times subject to a slightly smaller CFL condition. Our analysis also explains why a premerging step is beneficial. We show computational experiments in two and three dimensions.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates