443 Publications

Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit

Jason D. Lee, Kazusato Oko, Taiji Suzuki, D. Wu

We study the problem of gradient descent learning of a single-index target function f∗(x) = σ∗(⟨x,θ⟩) under isotropic Gaussian data in Rd, where the unknown link function σ∗ : R → R has information exponent p (defined as the lowest degree in the Hermite expansion). Prior works showed that gradientbased training of neural networks can learn this target with n ≳ dΘ(p) samples, and such complexity is predicted to be necessary by the correlational statistical query lower bound. Surprisingly, we prove that a two-layer neural network optimized by an SGD-based algorithm (on the squared loss) learns f∗ with a complexity that is not governed by the information exponent. Specifically, for arbitrary polynomial single-index models, we establish a sample and runtime complexity of n ≃ T = Θ(d·polylogd), where Θ(·) hides a constant only depending on the degree of σ∗; this dimension dependence matches the information theoretic limit up to polylogarithmic factors. More generally, we show that n ≳ d(p∗−1)∨1 samples are sufficient to achieve low generalization error, where p∗ ≤ p is the generative exponent of the link function. Core to our analysis is the reuse of minibatch in the gradient computation, which gives rise to higher-order information beyond correlational queries.

Show Abstract

Liquid Filled Elastomers: From Linearization to Elastic Enhancement

Juan Casado Dìaz, G. Francfort

Surface tension at cavity walls can play havoc with the mechanical properties of perforated soft solids when the cavities are filled with a fluid. This study is an investigation of the macroscopic elastic properties of elastomers embedding spherical cavities filled with a pressurized liquid in the presence of surface tension, starting with the linearization of the fully nonlinear model and ending with the enhancement properties of the linearized model when many such liquid filled cavities are present.

Show Abstract

Trapped acoustic waves and raindrops: high-order accurate integral equation method for localized excitation of a periodic staircase

F. Agocs, A. Barnett

We present a high-order boundary integral equation (BIE) method for the frequency-domain acoustic scattering of a point source by a singly-periodic, infinite, corrugated boundary. We apply it to the accurate numerical study of acoustic radiation in the neighborhood of a sound-hard two-dimensional staircase modeled after the El Castillo pyramid. Such staircases support trapped waves which travel along the surface and decay exponentially away from it. We use the array scanning method (Floquet–Bloch transform) to recover the scattered field as an integral over the family of quasiperiodic solutions parameterized by on-surface wavenumber. Each such BIE solution requires the quasiperiodic Green's function, which we evaluate using an efficient integral representation of lattice sum coefficients. We avoid the singularities and branch cuts present in the array scanning integral by complex contour deformation. For each frequency, this enables a solution accurate to around 10 digits in a few seconds. We propose a residue method to extract the limiting powers carried by trapped modes far from the source. Finally, by computing the trapped mode dispersion relation, we use a simple ray model to explain an acoustic chirp-like time-domain response that is referred to in the literature as the “raindrop effect.”

Show Abstract

xVal: A Continuous Numerical Tokenization for Scientific Language Models

Siavash Golkar, Ph.D. , Mariel Pettee, Ph.D. , M. Eickenberg, A. Bietti, et al.

Due in part to their discontinuous and discrete default encodings for numbers, Large Language Models (LLMs) have not yet been commonly used to process numerically-dense scientific datasets. Rendering datasets as text, however, could help aggregate diverse and multi-modal scientific data into a single training corpus, thereby potentially facilitating the development of foundation models for science. In this work, we introduce xVal, a strategy for continuously tokenizing numbers within language models that results in a more appropriate inductive bias for scientific applications. By training specially-modified language models from scratch on a variety of scientific datasets formatted as text, we find that xVal generally outperforms other common numerical tokenization strategies on metrics including out-of-distribution generalization and computational efficiency.

Show Abstract

A Dual-space Multilevel Kernel-splitting Framework for Discrete and Continuous Convolution

Abstract We introduce a new class of multilevel, adaptive, dual-space methods for computing fast convolutional transformations. These methods can be applied to a broad class of kernels, from the Green's functions for classical partial differential equations (PDEs) to power functions and radial basis functions such as those used in statistics and machine learning. The DMK (dual-space multilevel kernel-splitting) framework uses a hierarchy of grids, computing a smoothed interaction at the coarsest level, followed by a sequence of corrections at finer and finer scales until the problem is entirely local, at which point direct summation is applied. Unlike earlier multilevel summation schemes, DMK exploits the fact that the interaction at each scale is diagonalized by a short Fourier transform, permitting the use of separation of variables, but without relying on the FFT. This requires careful attention to the discretization of the Fourier transform at each spatial scale. Like multilevel summation, we make use of a recursive (telescoping) decomposition of the original kernel into the sum of a smooth far-field kernel, a sequence of difference kernels, and a residual kernel, which plays a role only in leaf boxes in the adaptive tree. At all higher levels in the grid hierarchy, the interaction kernels are designed to be smooth in both physical and Fourier space, admitting efficient Fourier spectral approximations. The DMK framework substantially simplifies the algorithmic structure of the fast multipole method (FMM) and unifies the FMM, Ewald summation, and multilevel summation, achieving speeds comparable to the FFT in work per gridpoint, even in a fully adaptive context. For continuous source distributions, the evaluation of local interactions is further accelerated by approximating the kernel at the finest level as a sum of Gaussians (SOG) with a highly localized remainder. The Gaussian convolutions are calculated using tensor product transforms, and the remainder term is calculated using asymptotic methods. We illustrate the performance of DMK for both continuous and discrete sources with extensive numerical examples in two and three dimensions.

Show Abstract

Multiple Physics Pretraining for Physical Surrogate Models

Michael McCabe, B. Régaldo-Saint Blancard, Liam Holden Parker, R. Ohana, Miles Cranmer, A. Bietti, Michael Eickenberg, et al.

We introduce multiple physics pretraining (MPP), an autoregressive task-agnostic pretraining approach for physical surrogate modeling of spatiotemporal systems with transformers. In MPP, rather than training one model on a specific physical system, we train a backbone model to predict the dynamics of multiple heterogeneous physical systems simultaneously in order to learn features that are broadly useful across systems and facilitate transfer. In order to learn effectively in this setting, we introduce a shared embedding and normalization strategy that projects the fields of multiple systems into a shared embedding space. We validate the efficacy of our approach on both pretraining and downstream tasks over a broad fluid mechanics-oriented benchmark. We show that a single MPP-pretrained transformer is able to match or outperform task-specific baselines on all pretraining sub-tasks without the need for finetuning. For downstream tasks, we demonstrate that finetuning MPP-trained models results in more accurate predictions across multiple time-steps on systems with previously unseen physical components or higher dimensional systems compared to training from scratch or finetuning pretrained video foundation models. We open-source our code and model weights trained at multiple scales for reproducibility.

Show Abstract

Provable Posterior Sampling with Denoising Oracles via Tilted Transport

Joan Bruna, J. Han

Score-based diffusion models have significantly advanced high-dimensional data generation across various domains, by learning a denoising oracle (or score) from datasets. From a Bayesian perspective, they offer a realistic modeling of data priors and facilitate solving inverse problems through posterior sampling. Although many heuristic methods have been developed recently for this purpose, they lack the quantitative guarantees needed in many scientific applications. This work addresses the topic from two perspectives. We first present a hardness result indicating that a generic method leveraging the prior denoising oracle for posterior sampling becomes infeasible as soon as the measurement operator is mildly ill-conditioned. We next develop the tilted transport technique, which leverages the quadratic structure of the log-likelihood in linear inverse problems in combination with the prior denoising oracle to exactly transform the original posterior sampling problem into a new one that is provably easier to sample from. We quantify the conditions under which the boosted posterior is strongly log-concave, highlighting how task difficulty depends on the condition number of the measurement matrix and the signal-to-noise ratio. The resulting general scheme is shown to match the best-known sampling methods for Ising models, and is further validated on high-dimensional Gaussian mixture models.

Show Abstract

Compressing the memory variables in constant-Q viscoelastic wave propagation via an improved sum-of-exponentials approximation

Xu Guo, S. Jiang, Yunfeng Xiong, Jiwei Zhang

Earth introduces strong attenuation and dispersion to propagating waves. The time-fractional wave equation with very small fractional exponent, based on Kjartansson's constant-Q theory, is widely recognized in the field of geophysics as a reliable model for frequency-independent Q anelastic behavior. Nonetheless, the numerical resolution of this equation poses considerable challenges due to the requirement of storing a complete time history of wavefields. To address this computational challenge, we present a novel approach: a nearly optimal sum-of-exponentials (SOE) approximation to the Caputo fractional derivative with very small fractional exponent, utilizing the machinery of generalized Gaussian quadrature. This method minimizes the number of memory variables needed to approximate the power attenuation law within a specified error tolerance. We establish a mathematical equivalence between this SOE approximation and the continuous fractional stress-strain relationship, relating it to the generalized Maxwell body model. Furthermore, we prove an improved SOE approximation error bound to thoroughly assess the ability of rheological models to replicate the power attenuation law. Numerical simulations on constant-Q viscoacoustic equation in 3D homogeneous media and variable-order P- and S- viscoelastic wave equations in 3D inhomogeneous media are performed. These simulations demonstrate that our proposed technique accurately captures changes in amplitude and phase resulting from material anelasticity. This advancement provides a significant step towards the practical usage of the time-fractional wave equation in seismic inversion.

Show Abstract

An adaptive spectral method for oscillatory second-order linear ODEs with frequency-independent cost

F. Agocs, A. Barnett

We introduce an efficient numerical method for second-order linear ODEs whose solution may vary between highly oscillatory and slowly changing over the solution interval. In oscillatory regions the solution is generated via a nonoscillatory phase function that obeys the nonlinear Riccati equation. We propose a defect correction iteration that gives an asymptotic series for such a phase function; this is numerically approximated on a Chebyshev grid with a small number of nodes. For analytic coefficients we prove that each iteration, up to a certain maximum number, reduces the residual by a factor of order of the local frequency. The algorithm adapts both the stepsize and the choice of method, switching to a conventional spectral collocation method away from oscillatory regions. In numerical experiments we find that our proposal outperforms other state-of-the-art oscillatory solvers, most significantly at low to intermediate frequencies and at low tolerances, where it may use up to \(10^6\) times fewer function evaluations. Even in high-frequency regimes, our implementation is on average 10 times faster than other specialized solvers.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates