287 Publications

Nested R̂ : Assessing the convergence of Markov chain Monte Carlo when running many short chains

C. Margossian, Matthew D. Hoffman, Pavel Sountsov, Lionel Riou-Durand, Aki Vehtari, Andrew Gelman

Recent developments in Markov chain Monte Carlo (MCMC) algorithms allow us to run thousands of chains in parallel almost as quickly as a single chain, using hardware accelerators such as GPUs. While each chain still needs to forget its initial point during a warmup phase, the subsequent sampling phase can be shorter than in classical settings, where we run only a few chains. To determine if the resulting short chains are reliable, we need to assess how close the Markov chains are to their stationary distribution after warmup. The potential scale reduction factor Rˆ is a popular convergence diagnostic but unfortunately can require a long sampling phase to work well. We present a nested design to overcome this challenge and a generalization called nested Rˆ. This new diagnostic works under conditions similar to Rˆ and completes the workflow for GPU-friendly samplers. In addition, the proposed nesting provides theoretical insights into the utility of Rˆ, in both classical and short-chains regimes.

Show Abstract

Decomposing imaginary time Feynman diagrams using separable basis functions: Anderson impurity model strong coupling expansion

J. Kaye, H. Strand, D. Golez

We present a deterministic algorithm for the efficient evaluation of imaginary time diagrams based on the recently introduced discrete Lehmann representation (DLR) of imaginary time Green's functions. In addition to the efficient discretization of diagrammatic integrals afforded by its approximation properties, the DLR basis is separable in imaginary time, allowing us to decompose diagrams into linear combinations of nested sequences of one-dimensional products and convolutions. Focusing on the strong coupling bold-line expansion of generalized Anderson impurity models, we show that our strategy reduces the computational complexity of evaluating an $M$th-order diagram at inverse temperature $\beta$ and spectral width $\omega_{\max}$ from $\mathcal{O}((\beta \omega_{\max})^{2M-1})$ for a direct quadrature to $\mathcal{O}(M (\log (\beta \omega_{\max}))^{M+1})$, with controllable high-order accuracy. We benchmark our algorithm using third-order expansions for multi-band impurity problems with off-diagonal hybridization and spin-orbit coupling, presenting comparisons with exact diagonalization and quantum Monte Carlo approaches. In particular, we perform a self-consistent dynamical mean-field theory calculation for a three-band Hubbard model with strong spin-orbit coupling representing a minimal model of Ca$_2$RuO$_4$, demonstrating the promise of the method for modeling realistic strongly correlated multi-band materials. For both strong and weak coupling expansions of low and intermediate order, in which diagrams can be enumerated, our method provides an efficient, straightforward, and robust black-box evaluation procedure. In this sense, it fills a gap between diagrammatic approximations of the lowest order, which are simple and inexpensive but inaccurate, and those based on Monte Carlo sampling of high-order diagrams.

Show Abstract

Explainable Equivariant Neural Networks for Particle Physics: PELICAN

A. Bogatskii, Timothy Hoffman, David W. Miller, Jan T. Offermann, Xiaoyang Liu

PELICAN is a novel permutation equivariant and Lorentz invariant or covariant aggregator network designed to overcome common limitations found in architectures applied to particle physics problems. Compared to many approaches that use non-specialized architectures that neglect underlying physics principles and require very large numbers of parameters, PELICAN employs a fundamentally symmetry group-based architecture that demonstrates benefits in terms of reduced complexity, increased interpretability, and raw performance. We present a comprehensive study of the PELICAN algorithm architecture in the context of both tagging (classification) and reconstructing (regression) Lorentz-boosted top quarks, including the difficult task of specifically identifying and measuring the $W$-boson inside the dense environment of the Lorentz-boosted top-quark hadronic final state. We also extend the application of PELICAN to the tasks of identifying quark-initiated vs.~gluon-initiated jets, and a multi-class identification across five separate target categories of jets. When tested on the standard task of Lorentz-boosted top-quark tagging, PELICAN outperforms existing competitors with much lower model complexity and high sample efficiency. On the less common and more complex task of 4-momentum regression, PELICAN also outperforms hand-crafted, non-machine learning algorithms. We discuss the implications of symmetry-restricted architectures for the wider field of machine learning for physics.

Show Abstract

Microscopic Theory, Analysis, and Interpretation of Conductance Histograms in Molecular Junctions

Leopoldo Mejía, P. Cossio, Ignacio Franco

Molecular electronics break-junction experiments are widely used to investigate fundamental physics and chemistry at the nanoscale. Reproducibility in these experiments relies on measuring conductance on thousands of freshly formed molecular junctions, yielding a broad histogram of conductance events. Experiments typically focus on the most probable conductance, while the information content of the conductance histogram has remained unclear. Here we develop a microscopic theory for the conductance histogram by merging the theory of force-spectroscopy with molecular conductance. The procedure yields analytical equations that accurately fit the conductance histogram of a wide range of molecular junctions and augments the information content that can be extracted from them. Our formulation captures contributions to the conductance dispersion due to conductance changes during the mechanical elongation inherent to the experiments. In turn, the histogram shape is determined by the non-equilibrium stochastic features of junction rupture and formation. The microscopic parameters in the theory capture the junction’s electromechanical properties and can be isolated from separate conductance and rupture force (or junction-lifetime) measurements. The predicted behavior can be used to test the range of validity of the theory, understand the conductance histograms, design molecular junction experiments with enhanced resolution and molecular devices with more reproducible conductance properties.

Show Abstract

Discriminative calibration: Check Bayesian computation from simulations and flexible classifier

Y. Yao, Justin Domke

To check the accuracy of Bayesian computations, it is common to use rank-based simulation-based calibration (SBC). However, SBC has drawbacks: The test statistic is somewhat ad-hoc, interactions are difficult to examine, multiple testing is a challenge, and the resulting p-value is not a divergence metric. We propose to replace the marginal rank test with a flexible classification approach that learns test statistics from data. This measure typically has a higher statistical power than the SBC rank test and returns an interpretable divergence measure of miscalibration, computed from classification accuracy. This approach can be used with different data generating processes to address likelihood-free inference or traditional inference methods like Markov chain Monte Carlo or variational inference. We illustrate an automated implementation using neural networks and statistically-inspired features, and validate the method with numerical and real data experiments.

Show Abstract

MoMo: Momentum Models for Adaptive Learning Rates

Fabian Schaipp, R. Ohana, M. Eickenberg, Aaron Defazio, R. M. Gower

Training a modern machine learning architecture on a new task requires extensive learning-rate tuning, which comes at a high computational cost. Here we develop new adaptive learning rates that can be used with any momentum method, and require less tuning to perform well. We first develop MoMo, a Momentum Model based adaptive learning rate for SGD-M (Stochastic gradient descent with momentum). MoMo uses momentum estimates of the batch losses and gradients sampled at each iteration to build a model of the loss function. Our model also makes use of any known lower bound of the loss function by using truncation, e.g. most losses are lower-bounded by zero. We then approximately minimize this model at each iteration to compute the next step. We show how MoMo can be used in combination with any momentum-based method, and showcase this by developing MoMo-Adam - which is Adam with our new model-based adaptive learning rate. Additionally, for losses with unknown lower bounds, we develop on-the-fly estimates of a lower bound, that are incorporated in our model. Through extensive numerical experiments, we demonstrate that MoMo and MoMo-Adam improve over SGD-M and Adam in terms of accuracy and robustness to hyperparameter tuning for training image classifiers on MNIST, CIFAR10, CIFAR100, Imagenet, recommender systems on the Criteo dataset, and a transformer model on the translation task IWSLT14.

Show Abstract

Variational Inference with Gaussian Score Matching

Variational inference (VI) is a method to approximate the computationally intractable posterior distributions that arise in Bayesian statistics. Typically, VI fits a simple parametric distribution to be close to the target posterior, optimizing an appropriate objective such as the evidence lower bound (ELBO). In this work, we present a new approach to VI. Our method is based on the principle of score matching---namely, that if two distributions are equal then their score functions (i.e., gradients of the log density) are equal at every point on their support. With this principle, we develop score-matching VI, an iterative algorithm that seeks to match the scores between the variational approximation and the exact posterior. At each iteration, score-matching VI solves an inner optimization, one that minimally adjusts the current variational estimate to match the scores at a newly sampled value of the latent variables. We show that when the variational family is a Gaussian, this inner optimization enjoys a closed-form solution, which we call Gaussian score matching VI (GSM-VI). GSM-VI is a ``black box'' variational algorithm in that it only requires a differentiable joint distribution, and as such it can be applied to a wide class of models. We compare GSM-VI to black box variational inference (BBVI), which has similar requirements but instead optimizes the ELBO. We first study how GSM-VI behaves as a function of the problem dimensionality, the condition number of the target covariance matrix (when the target is Gaussian), and the degree of mismatch between the approximating and exact posterior distribution. We then study GSM-VI on a collection of real-world Bayesian inference problems from the posteriorDB database of datasets and models. We find that GSM-VI is faster than BBVI and equally or more accurate. Specifically, over a wide range of target posteriors, GSM-VI requires 10-100x fewer gradient evaluations than BBVI to obtain a comparable quality of approximation.

Show Abstract

Compressing the memory variables in constant-Q viscoelastic wave propagation via an improved sum-of-exponentials approximation

Xu Guo, S. Jiang, Yunfeng Xiong, Jiwei Zhang

Earth introduces strong attenuation and dispersion to propagating waves. The time-fractional wave equation with very small fractional exponent, based on Kjartansson's constant-Q theory, is widely recognized in the field of geophysics as a reliable model for frequency-independent Q anelastic behavior. Nonetheless, the numerical resolution of this equation poses considerable challenges due to the requirement of storing a complete time history of wavefields. To address this computational challenge, we present a novel approach: a nearly optimal sum-of-exponentials (SOE) approximation to the Caputo fractional derivative with very small fractional exponent, utilizing the machinery of generalized Gaussian quadrature. This method minimizes the number of memory variables needed to approximate the power attenuation law within a specified error tolerance. We establish a mathematical equivalence between this SOE approximation and the continuous fractional stress-strain relationship, relating it to the generalized Maxwell body model. Furthermore, we prove an improved SOE approximation error bound to thoroughly assess the ability of rheological models to replicate the power attenuation law. Numerical simulations on constant-Q viscoacoustic equation in 3D homogeneous media and variable-order P- and S- viscoelastic wave equations in 3D inhomogeneous media are performed. These simulations demonstrate that our proposed technique accurately captures changes in amplitude and phase resulting from material anelasticity. This advancement provides a significant step towards the practical usage of the time-fractional wave equation in seismic inversion.

Show Abstract

A Dual-space Multilevel Kernel-splitting Framework for Discrete and Continuous Convolution

We introduce a new class of multilevel, adaptive, dual-space methods for computing fast convolutional transforms. These methods can be applied to a broad class of kernels, from the Green's functions for classical partial differential equations (PDEs) to power functions and radial basis functions such as those used in statistics and machine learning. The DMK (dual-space multilevel kernel-splitting) framework uses a hierarchy of grids, computing a smoothed interaction at the coarsest level, followed by a sequence of corrections at finer and finer scales until the problem is entirely local, at which point direct summation is applied. The main novelty of DMK is that the interaction at each scale is diagonalized by a short Fourier transform, permitting the use of separation of variables, but without requiring the FFT for its asymptotic performance. The DMK framework substantially simplifies the algorithmic structure of the fast multipole method (FMM) and unifies the FMM, Ewald summation, and multilevel summation, achieving speeds comparable to the FFT in work per gridpoint, even in a fully adaptive context. For continuous source distributions, the evaluation of local interactions is further accelerated by approximating the kernel at the finest level as a sum of Gaussians with a highly localized remainder. The Gaussian convolutions are calculated using tensor product transforms, and the remainder term is calculated using asymptotic methods. We illustrate the performance of DMK for both continuous and discrete sources with extensive numerical examples in two and three dimensions.

Show Abstract
September 10, 2023

A new provably stable weighted state redistribution algorithm

We propose a practical finite volume method on cut cells using state redistribution. Our algorithm is provably monotone, total variation diminishing, and GKS stable in many situations, and shuts off smoothly as the cut cell size approaches a target value. Our analysis reveals why original state redistribution works so well: it results in a monotone scheme for most configurations, though at times subject to a slightly smaller CFL condition. Our analysis also explains why a pre-merging step is beneficial. We show computational experiments in two and three dimensions.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates