CCM: Publications

SP2 : A Second Order Stochastic Polyak Method

Shuang Li , William Joseph Swartworth , Martin Takáč , Deanna Needell, R. M. Gower

Recently the SP (Stochastic Polyak step size) method has emerged as a competitive adaptive method for setting the step sizes of SGD. SP can be interpreted as a method specialized to interpolated models, since it solves the interpolation equations. SP solves these equation by using local linearizations of the model. We take a step further and develop a method for solving the interpolation equations that uses the local second-order approximation of the model. Our resulting method SP2 uses Hessian-vector products to speed-up the convergence of SP. Furthermore, and rather uniquely among second-order methods, the design of SP2 in no way relies on positive definite Hessian matrices or convexity of the objective function. We show SP2 is competitive both in experiments and in theory.
We show SP2 is very competitive on matrix completion, non-convex test problems and logistic regression. We also provide a convergence theory on sums-of-quadratics.

Show Abstract

Handbook of Convergence Theorems for (Stochastic) Gradient Methods

Guillaume Garrigos, R. M. Gower

This is a handbook of simple proofs of the convergence of gradient and stochastic gradient descent type methods. We consider functions that are Lipschitz, smooth, convex, strongly convex, and/or Polyak-Łojasiewicz functions. Our focus is on

Show Abstract

A fast, high-order numerical method for the simulation of single-excitation states in quantum optics

Jeremy Hoskins, J. Kaye, M. Rachh, John C. Schotland

We consider the numerical solution of a nonlocal partial differential equation which models the process of collective spontaneous emission in a two-level atomic system containing a single photon. We reformulate the problem as an integro-differential equation for the atomic degrees of freedom, and describe an efficient solver for the case of a Gaussian atomic density. The problem of history dependence arising from the integral formulation is addressed using sum-of-exponentials history compression. We demonstrate the solver on two systems of physical interest: in the first, an initially-excited atom decays into a photon by spontaneous emission, and in the second, a photon pulse is used to an excite an atom, which then decays.

Show Abstract

Coordinated drift of receptive fields in Hebbian/anti-Hebbian network models during noisy representation learning

Shanshan Qin, S. Farashahi, D. Lipshutz, A. Sengupta, D. Chklovskii, Cengiz Pehlevan

Recent experiments have revealed that neural population codes in many brain areas continuously change even when animals have fully learned and stably perform their tasks. This representational ‘drift’naturally leads to questions about its causes, dynamics and functions. Here we explore the hypothesis that neural representations optimize a representational objective with a degenerate solution space, and noisy synaptic updates drive the network to explore this (near-)optimal space causing representational drift. We illustrate this idea and explore its consequences in simple, biologically plausible Hebbian/anti-Hebbian network models of representation learning. We find that the drifting receptive fields of individual neurons can be characterized by a coordinated random walk, with effective diffusion constants depending on various parameters such as learning rate, noise amplitude and input statistics. Despite such drift, the representational similarity of population codes is stable over time. Our model recapitulates experimental observations in the hippocampus and posterior parietal cortex and makes testable predictions that can be probed in future experiments.

Show Abstract

Adaptive Tuning for Metropolis Adjusted Langevin Trajectories

Lionel Riou-Durand, Pavel Sountsov, Jure Vogrinc, C. Margossian, Sam Power

Hamiltonian Monte Carlo (HMC) is a widely used sampler for continuous probability distributions. In many cases, the underlying Hamiltonian dynamics exhibit a phenomenon of resonance which decreases the efficiency of the algorithm and makes it very sensitive to hyperparameter values. This issue can be tackled efficiently, either via the use of trajectory length randomization (RHMC) or via partial momentum refreshment. The second approach is connected to the kinetic Langevin diffusion, and has been mostly investigated through the use of Generalized HMC (GHMC). However, GHMC induces momentum flips upon rejections causing the sampler to backtrack and waste computational resources. In this work we focus on a recent algorithm bypassing this issue, named Metropolis Adjusted Langevin Trajectories (MALT). We build upon recent strategies for tuning the hyperparameters of RHMC which target a bound on the Effective Sample Size (ESS) and adapt it to MALT, thereby enabling the first user-friendly deployment of this algorithm. We construct a method to optimize a sharper bound on the ESS and reduce the estimator variance. Easily compatible with parallel implementation, the resultant Adaptive MALT algorithm is competitive in terms of ESS rate and hits useful tradeoffs in memory usage when compared to GHMC, RHMC and NUTS.

Show Abstract

Eliminating Artificial Boundary Conditions in Time-Dependent Density Functional Theory Using Fourier Contour Deformation

J. Kaye, A. Barnett, L. Greengard, Umberto De Giovannini, A. Rubio

We present an efficient method for propagating the time-dependent Kohn–Sham equations in free space, based on the recently introduced Fourier contour deformation (FCD) approach. For potentials which are constant outside a bounded domain, FCD yields a high-order accurate numerical solution of the time-dependent Schrödinger equation directly in free space, without the need for artificial boundary conditions. Of the many existing artificial boundary condition schemes, FCD is most similar to an exact nonlocal transparent boundary condition, but it works directly on Cartesian grids in any dimension, and runs on top of the fast Fourier transform rather than fast algorithms for the application of nonlocal history integral operators. We adapt FCD to time-dependent density functional theory (TDDFT), and describe a simple algorithm to smoothly and automatically truncate long-range Coulomb-like potentials to a time-dependent constant outside of a bounded domain of interest, so that FCD can be used. This approach eliminates errors originating from the use of artificial boundary conditions, leaving only the error of the potential truncation, which is controlled and can be systematically reduced. The method enables accurate simulations of ultrastrong nonlinear electronic processes in molecular complexes in which the interference between bound and continuum states is of paramount importance. We demonstrate results for many-electron TDDFT calculations of absorption and strong field photoelectron spectra for one and two-dimensional models, and observe a significant reduction in the size of the computational domain required to achieve high quality results, as compared with the popular method of complex absorbing potentials.

Show Abstract

Generative Models of Multichannel Data from a Single Example—Application to Dust Emission

B. Régaldo-Saint Blancard, Erwan Allys, Constant Auclair, François Boulanger, M. Eickenberg, François Levrier, Léo Vacher, Sixin Zhang

The quest for primordial B-modes in the cosmic microwave background has emphasized the need for refined models of the Galactic dust foreground. Here we aim at building a realistic statistical model of the multifrequency dust emission from a single example. We introduce a generic methodology relying on microcanonical gradient descent models conditioned by an extended family of wavelet phase harmonic (WPH) statistics. To tackle the multichannel aspect of the data, we define cross-WPH statistics, quantifying non-Gaussian correlations between maps. Our data-driven methodology could apply to various contexts, and we have updated the software PyWPH, on which this work relies, accordingly. Applying this to dust emission maps built from a magnetohydrodynamics simulation, we construct and assess two generative models: (1) a (I, E, B) multi-observable input, and (2) a {I

Show Abstract

Shedding a PAC-Bayesian Light on Adaptive Sliced-Wasserstein Distances

R. Ohana, Kimia Nadjahi, Alain Rakotomamonjy, Liva Ralaivola

The Sliced-Wasserstein distance (SW) is a computationally efficient and theoretically grounded alternative to the Wasserstein distance. Yet, the literature on its statistical properties -- or, more accurately, its generalization properties -- with respect to the distribution of slices, beyond the uniform measure, is scarce. To bring new contributions to this line of research, we leverage the PAC-Bayesian theory and a central observation that SW may be interpreted as an average risk, the quantity PAC-Bayesian bounds have been designed to characterize. We provide three types of results: i) PAC-Bayesian generalization bounds that hold on what we refer as adaptive Sliced-Wasserstein distances, i.e. SW defined with respect to arbitrary distributions of slices (among which data-dependent distributions), ii) a principled procedure to learn the distribution of slices that yields maximally discriminative SW, by optimizing our theoretical bounds, and iii) empirical illustrations of our theoretical findings.

Show Abstract

Linear optical random projections without holography

R. Ohana, Daniel Hesslow, Daniel Brunner, Sylvain Gigan, Kilian Müller

We introduce what we believe to be a novel method to perform linear optical random projections without the need for holography. Our method consists of a computationally trivial combination of multiple intensity measurements to mitigate the information loss usually associated with the absolute-square non-linearity imposed by optical intensity measurements. Both experimental and numerical findings demonstrate that the resulting matrix consists of real-valued, independent, and identically distributed (i.i.d.) Gaussian random entries. Our optical setup is simple and robust, as it does not require interference between two beams. We demonstrate the practical applicability of our method by performing dimensionality reduction on high-dimensional data, a common task in randomized numerical linear algebra with relevant applications in machine learning.

Show Abstract

A geometrical connection between sparse and low-rank matrices and its application to manifold learning

L. Saul

We consider when a sparse nonnegative matrix \(\mathbf{S}\) can be recovered, via an elementwise nonlinearity, from a real-valued matrix~ \(\mathbf{S}\) of significantly lower rank. Of particular interest is the setting where the positive elements of \( \mathbf{S}\) encode the similarities of nearby points on a low dimensional manifold. The recovery can then be posed as a problem in manifold learning---in this case, how to learn a norm-preserving and neighborhood-preserving mapping of high dimensional inputs into a lower dimensional space. We describe an algorithm for this problem based on a generalized low-rank decomposition of sparse matrices. This decomposition has the interesting property that it can be encoded by a neural network with one layer of rectified linear units; since the algorithm discovers this encoding, it can also be viewed as a layerwise primitive for deep learning. The algorithm regards the inputs \(\mathbf{x}_i|)\) and \(\mathbf{x}_j\)\) as similar whenever the cosine of the angle between them exceeds some threshold \(\tau\in(0,1)\). Given this threshold, the algorithm attempts to discover a mapping \(\mathbf{x}_i\mapsto\mathbf{y}_i\) by matching the elements of two sparse matrices; in particular, it seeks a mapping for which \(\mathbf{S}=\max(0,\mathbf{L})\), where \(S_{ij} = \max(0,\mathbf{x}_i\cdot\mathbf{x}_j - \tau\|\mathbf{x}_i\|\|\mathbf{x}_j\|)\) and \(L_{ij} = \mathbf{y}_i\cdot\mathbf{y}_j - \tau\|\mathbf{y}_i\|\|\mathbf{y}_j\|\). We apply the algorithm to data sets where vector magnitudes and small cosine distances have interpretable meanings (e.g., the brightness of an image, the similarity to other words). On these data sets, the algorithm is able to discover much lower dimensional representations that preserve these meanings

Show Abstract