CCM: Publications

riccati: an adaptive, spectral solver for oscillatory ODEs

Equispaced Fourier representations for efficient Gaussian process regression from a billion data points

We introduce a Fourier-based fast algorithm for Gaussian process regression in low dimensions. It approximates a translationally-invariant covariance kernel by complex exponentials on an equispaced Cartesian frequency grid of $M$ nodes. This results in a weight-space $M\times M$ system matrix with Toeplitz structure, which can thus be applied to a vector in ${\mathcal O}(M \log{M})$ operations via the fast Fourier transform (FFT), independent of the number of data points $N$. The linear system can be set up in ${\mathcal O}(N + M \log{M})$ operations using nonuniform FFTs. This enables efficient massive-scale regression via an iterative solver, even for kernels with fat-tailed spectral densities (large $M$). We provide bounds on both kernel approximation and posterior mean errors. Numerical experiments for squared-exponential and Matérn kernels in one, two and three dimensions often show 1-2 orders of magnitude acceleration over state-of-the-art rank-structured solvers at comparable accuracy. Our method allows 2D Matérn-$\mbox{$\frac{3}{2}$}$ regression from $N=10^9$ data points to be performed in 2 minutes on a standard desktop, with posterior mean accuracy $10^{-3}$. This opens up spatial statistics applications 100 times larger than previously possible.

Show Abstract

A new version of the adaptive fast Gauss transform for discrete and continuous sources

L. Greengard, S. Jiang, M. Rachh, J. Wang

We present a new version of the fast Gauss transform (FGT) for discrete and continuous sources. Classical Hermite expansions are avoided entirely, making use only of the plane-wave representation of the Gaussian kernel and a new hierarchical merging scheme. For continuous source distributions sampled on adaptive tensor-product grids, we exploit the separable structure of the Gaussian kernel to accelerate the computation. For discrete sources, the scheme relies on the nonuniform fast Fourier transform (NUFFT) to construct near field plane wave representations. The scheme has been implemented for either free-space or periodic boundary conditions. In many regimes, the speed is comparable to or better than that of the conventional FFT in work per gridpoint, despite being fully adaptive.

Show Abstract

A fast, accurate and easy to implement Kapur — Rokhlin quadrature scheme for singular integrals in axisymmetric geometries

Evan Toler, A.J. Cerfon, D. Malhotra

Many applications in magnetic confinement fusion require the efficient calculation of surface integrals with singular integrands. The singularity subtraction approaches typically used to handle such singularities are complicated to implement and low-order accurate. In contrast, we demonstrate that the Kapur–Rokhlin quadrature scheme is well-suited for the logarithmically singular integrals encountered for a toroidally axisymmetric confinement system, is easy to implement and is high-order accurate. As an illustration, we show how to apply this quadrature scheme for the efficient and accurate calculation of the normal component of the magnetic field due to the plasma current on the plasma boundary, via the virtual-casing principle.

Show Abstract

Eliminating Artificial Boundary Conditions in Time-Dependent Density Functional Theory Using Fourier Contour Deformation

J. Kaye, A. Barnett, L. Greengard, Umberto De Giovannini, A. Rubio

We present an efficient method for propagating the time-dependent Kohn–Sham equations in free space, based on the recently introduced Fourier contour deformation (FCD) approach. For potentials which are constant outside a bounded domain, FCD yields a high-order accurate numerical solution of the time-dependent Schrödinger equation directly in free space, without the need for artificial boundary conditions. Of the many existing artificial boundary condition schemes, FCD is most similar to an exact nonlocal transparent boundary condition, but it works directly on Cartesian grids in any dimension, and runs on top of the fast Fourier transform rather than fast algorithms for the application of nonlocal history integral operators. We adapt FCD to time-dependent density functional theory (TDDFT), and describe a simple algorithm to smoothly and automatically truncate long-range Coulomb-like potentials to a time-dependent constant outside of a bounded domain of interest, so that FCD can be used. This approach eliminates errors originating from the use of artificial boundary conditions, leaving only the error of the potential truncation, which is controlled and can be systematically reduced. The method enables accurate simulations of ultrastrong nonlinear electronic processes in molecular complexes in which the interference between bound and continuum states is of paramount importance. We demonstrate results for many-electron TDDFT calculations of absorption and strong field photoelectron spectra for one and two-dimensional models, and observe a significant reduction in the size of the computational domain required to achieve high quality results, as compared with the popular method of complex absorbing potentials.

Show Abstract

A high-order integral equation-based solver for the time-dependent Schrödinger equation

J. Kaye, A. Barnett, L. Greengard

We introduce a numerical method for the solution of the time-dependent Schrödinger equation with a smooth potential, based on its reformulation as a Volterra integral equation. We present versions of the method both for periodic boundary conditions, and for free space problems with compactly supported initial data and potential. A spatially uniform electric field may be included, making the solver applicable to simulations of light-matter interaction. The primary computational challenge in using the Volterra formulation is the application of a spacetime history dependent integral operator. This may be accomplished by projecting the solution onto a set of Fourier modes, and updating their coefficients from one time step to the next by a simple recurrence. In the periodic case, the modes are those of the usual Fourier series, and the fast Fourier transform (FFT) is used to alternate between physical and frequency domain grids. In the free space case, the oscillatory behavior of the spectral Green's function leads us to use a set of complex-frequency Fourier modes obtained by discretizing a contour deformation of the inverse Fourier transform, and we develop a corresponding fast transform based on the FFT. Our approach is related to pseudospectral methods, but applied to an integral rather than the usual differential formulation. This has several advantages: it avoids the need for artificial boundary conditions, admits simple, inexpensive, high-order implicit time marching schemes, and naturally includes time-dependent potentials. We present examples in one and two dimensions showing spectral accuracy in space and eighth-order accuracy in time for both periodic and free space problems.

Show Abstract

How exponentially ill-conditioned are contiguous submatrices of the Fourier matrix?

A. Barnett

Linear systems involving contiguous submatrices of the discrete Fourier transform (DFT)matrix arise in many applications, such as Fourier extension, superresolution, and coherent diffraction imaging. We show that the condition number of any such p\times q submatrix of the N\times NDFT matrix is at least exp\bigl( \pi 2\bigl[ min(p,q) - pqN\bigr] \bigr) , up to algebraic prefactors.That is, fixing the shape parameters (\alpha ,\beta ) := (p/N,q/N)\in (0,1)2, the growth ise\rho NasN\rightarrow \infty , the exponential rate being\rho =\pi 2[min(\alpha ,\beta ) - \alpha \beta ]. Our proof uses theKaiser--Bessel transform pair (of which we give a self-contained proof), plus estimates on sums over distorted sinc functions, to construct a localized trial vector whose DFT is also localized. We warm up with an elementary proof of the above but with half the rate, via a periodized Gaussian trial vector. Using low-rank approximation of the kerneleixt, we also prove another lower bound (4/e\pi \alpha )q, up to algebraic prefactors, which is stronger than the above for small\alpha and\beta . When combined, the bounds are within a factor of two ofthe empirical asymptotic rate, uniformly over (0,1)2, and become sharp in certain regions.However, the results are not asymptotic: they apply to essentially allN,p, andq, and with all constants explicit.

Show Abstract

Geometry of the Phase Retrieval Problem: Graveyard of Algorithms

A. Barnett, C. Epstein, L. Greengard, J. Magland

Recovering the phase of the Fourier transform is a ubiquitous problem in imaging applications from astronomy to nanoscale X-ray diffraction imaging. Despite the efforts of a multitude of scientists, from astronomers to mathematicians, there is, as yet, no satisfactory theoretical or algorithmic solution to this class of problems. Written for mathematicians, physicists and engineers working in image analysis and reconstruction, this book introduces a conceptual, geometric framework for the analysis of these problems, leading to a deeper understanding of the essential, algorithmically independent, difficulty of their solutions. Using this framework, the book studies standard algorithms and a range of theoretical issues in phase retrieval and provides several new algorithms and approaches to this problem with the potential to improve the reconstructed images. The book is lavishly illustrated with the results of numerous numerical experiments that motivate the theoretical development and place it in the context of practical applications.

Show Abstract

cuFINUFFT: a load-balanced GPU library for general-purpose nonuniform FFTs

Yu-hsuan Shih, Garrett Wright, Joakim Andén, Johannes Blaschke, A. Barnett

Nonuniform fast Fourier transforms dominate the computational cost in many applications including image reconstruction and signal processing. We thus present a general-purpose GPU-based CUDA library for type 1 (nonuniform to uniform) and type 2 (uniform to nonuniform) transforms in dimensions 2 and 3, in single or double precision. It achieves high performance for a given user-requested accuracy, regardless of the distribution of nonuniform points, via cache-aware point reordering, and load-balanced blocked spreading in shared memory. At low accuracies, this gives on-GPU throughputs around $10^9$ nonuniform points per second, and (even including host-device transfer) is typically 4-10$\times$ faster than the latest parallel CPU code FINUFFT (at 28 threads). It is competitive with two established GPU codes, being up to 90$\times$ faster at high accuracy and/or type 1 clustered point distributions. Finally we demonstrate a 6-18$\times$ speedup versus CPU in an X-ray diffraction 3D iterative reconstruction task at $10^{-12}$ accuracy, observing excellent multi-GPU weak scaling up to one rank per GPU. paper talk: https://www.youtube.com/watch?v=PnW6ehMyHxM.

Show Abstract

Fast multipole methods for evaluation of layer potentials with locally-corrected quadratures

L. Greengard, Michael O'Neil, M. Rachh, Felipe Vico

While fast multipole methods (FMMs) are in widespread use for the rapid evaluation of potential fields governed by the Laplace, Helmholtz, Maxwell or Stokes equations, their coupling to high-order quadratures for evaluating layer potentials is still an area of active research. In three dimensions, a number of issues need to be addressed, including the specification of the surface as the union of high-order patches, the incorporation of accurate quadrature rules for integrating singular or weakly singular Green's functions on such patches, and their coupling to the oct-tree data structures on which the FMM separates near and far field interactions. Although the latter is straightforward for point distributions, the near field for a patch is determined by its physical dimensions, not the distribution of discretization points on the surface. Here, we present a general framework for efficiently coupling locally corrected quadratures with FMMs, relying primarily on what are called generalized Gaussian quadratures rules, supplemented by adaptive integration. The approach, however, is quite general and easily applicable to other schemes, such as Quadrature by Expansion (QBX). We also introduce a number of accelerations to reduce the cost of quadrature generation itself, and present several numerical examples of acoustic scattering that demonstrate the accuracy, robustness, and computational efficiency of the scheme. On a single core of an Intel i5 2.3GHz processor, a Fortran implementation of the scheme can generate near field quadrature corrections for between 1000 and 10,000 points per second, depending on the order of accuracy and the desired precision. A Fortran implementation of the algorithm described in this work is available at https://gitlab.com/fastalgorithms/fmm3dbiethis https URL.

Show Abstract