2573 Publications

Tensor hypercontraction for fully self-consistent imaginary-time GF2 and GWSOX methods: Theory, implementation, and role of the Green’s function second-order exchange for intermolecular interactions

We present an efficient MPI-parallel algorithm and its implementation for evaluating the self-consistent correlated second-order exchange term (SOX), which is employed as a correction to the fully self-consistent GW scheme called scGWSOX (GW plus the SOX term iterated to achieve full Green’s function self-consistency). Due to the application of the tensor hypercontraction (THC) in our computational procedure, the scaling of the evaluation of scGWSOX is reduced from O(nτnAO5) to O(nτN2nAO2). This fully MPI-parallel and THC-adapted approach enabled us to conduct the largest fully self-consistent scGWSOX calculations with over 1100 atomic orbitals with only negligible errors attributed to THC fitting. Utilizing our THC implementation for scGW, scGF2, and scGWSOX, we evaluated energies of intermolecular interactions. This approach allowed us to circumvent issues related to reference dependence and ambiguity in energy evaluation, which are common challenges in non-self-consistent calculations. We demonstrate that scGW exhibits a slight overbinding tendency for large systems, contrary to the underbinding observed with non-self-consistent RPA. Conversely, scGWSOX exhibits a slight underbinding tendency for such systems. This behavior is both physical and systematic and is caused by exclusion-principle violating diagrams or corresponding corrections. Our analysis elucidates the role played by these different diagrams, which is crucial for the construction of rigorous, accurate, and systematic methods. Finally, we explicitly show that all perturbative fully self-consistent Green’s function methods are size-extensive and size-consistent.
Show Abstract
August 1, 2024

Nested R̂ : Assessing the Convergence of Markov Chain Monte Carlo When Running Many Short Chains

C. Margossian, Matthew D. Hoffman, Pavel Sountsov, Lionel Riou-Durand, Aki Vehtari, Andrew Gelman

Recent developments in parallel Markov chain Monte Carlo (MCMC) algorithms allow us to run thousands of chains almost as quickly as a single chain, using hardware accelerators such as GPUs. While each chain still needs to forget its initial point during a warmup phase, the subsequent sampling phase can be shorter than in classical settings, where we run only a few chains. To determine if the resulting short chains are reliable, we need to assess how close the Markov chains are to their stationary distribution after warmup. The potential scale reduction factor Rˆ is a popular convergence diagnostic but unfortunately can require a long sampling phase to work well. We present a nested design to overcome this challenge and a generalization called nested Rˆ. This new diagnostic works under conditions similar to Rˆ and completes the workflow for GPU-friendly samplers. In addition, the proposed nesting provides theoretical insights into the utility of Rˆ, in both classical and short-chains regimes.

Show Abstract

Amortized template-matching of molecular conformations from cryo-electron microscopy images using simulation-based inference

Lars Dingeldein, David Silva-Sánchez, L. Evans, P. Cossio, et al.

Biomolecules undergo conformational changes to perform their function. Cryo-electron microscopy (cryo-EM) can capture snapshots of biomolecules in various conformations. However, these images are noisy and display the molecule in unknown orientations, making it difficult to separate conformational differences from differences due to noise or projection directions. Here, we introduce cryo-EM simulation-based inference (cryoSBI) to infer the conformations of biomolecules and the uncertainties associated with the inference from individual cryo-EM images. CryoSBI builds on simulation-based inference, a combination of physics-based simulations and probabilistic deep learning, allowing us to use Bayesian inference even when likelihoods are too expensive to calculate. We begin with an ensemble of conformations, which can be templates from molecular simulations or modelling, and use them as structural hypotheses. We train a neural network approximating the Bayesian posterior using simulated images from these templates, and then use it to accurately infer the conformations of biomolecules from experimental images. Training is only done once, and after that, it takes just a few milliseconds to make inference on an image, making cryoSBI suitable for arbitrarily large datasets. CryoSBI eliminates the need to estimate particle pose and imaging parameters, significantly enhancing the computational speed in comparison to explicit likelihood methods. We illustrate and benchmark cryoSBI on synthetic data and showcase its promise on experimental single-particle cryo-EM data.

Show Abstract
2024

MoMo: Momentum Models for Adaptive Learning Rates

Fabian Schaipp, R. Ohana, M. Eickenberg, Aaron Defazio, R. M. Gower

Training a modern machine learning architecture on a new task requires extensive learning-rate tuning, which comes at a high computational cost. Here we develop new Polyak-type adaptive learning rates that can be used on top of any momentum method, and require less tuning to perform well. We first develop MoMo, a Momentum Model based adaptive learning rate for SGD-M (stochastic gradient descent with momentum). MoMo uses momentum estimates of the batch losses and gradients sampled at each iteration to build a model of the loss function. Our model also makes use of any known lower bound of the loss function by using truncation, e.g. most losses are lower-bounded by zero. The models is then approximately minimized at each iteration to compute the next step. We show how MoMo can be used in combination with any momentum-based method, and showcase this by developing MoMo-Adam - which is Adam with our new model-based adaptive learning rate. We show that MoMo attains a $\mathcal{O}(1/\sqrt{K})$ convergence rate for convex problems with interpolation, needing knowledge of no problem-specific quantities other than the optimal value. Additionally, for losses with unknown lower bounds, we develop on-the-fly estimates of a lower bound, that are incorporated in our model. We demonstrate that MoMo and MoMo-Adam improve over SGD-M and Adam in terms of robustness to hyperparameter tuning for training image classifiers on MNIST, CIFAR, and Imagenet, for recommender systems on the Criteo dataset, for a transformer model on the translation task IWSLT14, and for a diffusion model.

Show Abstract

Dynamical arrest in active nematic turbulence

I. Lavi, Ricard Alert, et al.

Active fluids display spontaneous turbulent-like flows known as active turbulence. Recent work revealed that these flows have universal features, independent of the material properties and of the presence of topological defects. However, the differences between defect-laden and defect-free active turbulence remain largely unexplored. Here, by means of large-scale numerical simulations, we show that defect-free active nematic turbulence can undergo dynamical arrest. We find that flow alignment -- the tendency of nematics to reorient under shear -- enhances large-scale jets in contractile rodlike systems while promoting arrested flow patterns in extensile systems. Our results reveal a mechanism of labyrinthine pattern formation produced by an emergent topology of nematic domain walls that partially suppresses chaotic flows. Taken together, our findings call for the experimental realization of defect-free active nematics, and suggest that topological defects enable turbulence by preventing dynamical arrest.

Show Abstract
July 21, 2024

SILVER: Single-loop variance reduction and application to federated learning

Kazusato Oko, Shunta Akiyama, D. Wu, Tomoya Murata, Taiji Suzuki

Most variance reduction methods require multiple times of full gradient computation, which is time-consuming and hence a bottleneck in application to distributed optimization. We present a single-loop variance-reduced gradient estimator named SILVER (SIngle-Loop VariancE-Reduction) for the finite-sum non-convex optimization, which does not require multiple full gradients but nevertheless achieves the optimal gradient complexity. Notably, unlike existing methods, SILVER provably reaches second-order optimality, with exponential convergence in the Polyak-Łojasiewicz (PL) region, and achieves further speedup depending on the data heterogeneity. Owing to these advantages, SILVER serves as a new base method to design communication-efficient federated learning algorithms: we combine SILVER with local updates which gives the best communication rounds and number of communicated gradients across all range of Hessian heterogeneity, and, at the same time, guarantees second-order optimality and exponential convergence in the PL region.

Show Abstract

Learning Associative Memories with Gradient Descent

Vivien Cabannes, B. Şimşek, A. Bietti

This work focuses on the training dynamics of one associative memory module storing outer products of token embeddings. We reduce this problem to the study of a system of particles, which interact according to properties of the data distribution and correlations between embeddings. Through theory and experiments, we provide several insights. In overparameterized regimes, we obtain logarithmic growth of the “classification margins.” Yet, we show that imbalance in token frequencies and memory interferences due to correlated embeddings lead to oscillatory transitory regimes. The oscillations are more pronounced with large step sizes, which can create benign loss spikes, although these learning rates speed up the dynamics and accelerate the asymptotic convergence. We also find that underparameterized regimes lead to suboptimal memorization schemes. Finally, we assess the validity of our findings on small Transformer models.

Show Abstract

Listening to the noise: Blind Denoising with Gibbs Diffusion

David Heurtel-Depeiges, C. Margossian, R. Ohana, B. Régaldo-Saint Blancard

In recent years, denoising problems have become intertwined with the development of deep generative models. In particular, diffusion models are trained like denoisers, and the distribution they model coincide with denoising priors in the Bayesian picture. However, denoising through diffusion-based posterior sampling requires the noise level and covariance to be known, preventing blind denoising. We overcome this limitation by introducing Gibbs Diffusion (GDiff), a general methodology addressing posterior sampling of both the signal and the noise parameters. Assuming arbitrary parametric Gaussian noise, we develop a Gibbs algorithm that alternates sampling steps from a conditional diffusion model trained to map the signal prior to the class of noise distributions, and a Monte Carlo sampler to infer the noise parameters. Our theoretical analysis highlights potential pitfalls, guides diagnostic usage, and quantifies errors in the Gibbs stationary distribution caused by the diffusion model. We showcase our method for 1) blind denoising of natural images involving colored noises with unknown amplitude and exponent, and 2) a cosmology problem, namely the analysis of cosmic microwave background data, where Bayesian inference of "noise" parameters means constraining models of the evolution of the Universe.

Show Abstract

Cytoplasmic stirring by active carpets

B. Chakrabarti, M. Rachh, S. Shvartsman, M. Shelley

Large cells often rely on cytoplasmic flows for intracellular transport, maintaining homeostasis, and positioning cellular components. Understanding the mechanisms of these flows is essential for gaining insights into cell function, developmental processes, and evolutionary adaptability. Here, we focus on a class of self-organized cytoplasmic stirring mechanisms that result from fluid–structure interactions between cytoskeletal elements at the cell cortex. Drawing inspiration from streaming flows in late-stage fruit fly oocytes, we propose an analytically tractable active carpet theory. This model deciphers the origins and three-dimensional spatiotemporal organization of such flows. Through a combination of simulations and weakly nonlinear theory, we establish the pathway of the streaming flow to its global attractor: a cell-spanning vortical twister. Our study reveals the inherent symmetries of this emergent flow, its low-dimensional structure, and illustrates how complex fluid–structure interaction aligns with classical solutions in Stokes flow. This framework can be easily adapted to elucidate a broad spectrum of self-organized, cortex-driven intracellular flows.

Show Abstract

Amortized Variational Inference: When and Why?

C. Margossian, D. Blei

In a probabilistic latent variable model, factorized (or mean-field) variational inference (F-VI) fits a separate parametric distribution for each latent variable. Amortized variational inference (A-VI) instead learns a common inference function, which maps each observation to its corresponding latent variable’s approximate posterior. Typically, A-VI is used as a cog in the training of variational autoencoders, however it stands to reason that A-VI could also be used as a general alternative to F-VI. In this paper we study when and why A-VI can be used for approximate Bayesian inference. We derive conditions on a latent variable model which are necessary, sufficient, and verifiable under which A-VI can attain F-VI’s optimal solution, thereby closing the amortization gap. We prove these conditions are uniquely verified by simple hierarchical models, a broad class that encompasses many models in machine learning. We then show, on a broader class of models, how to expand the domain of AVI’s inference function to improve its solution, and we provide examples, e.g. hidden Markov models, where the amortization gap cannot be closed.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates

privacy consent banner

Privacy preference

We use cookies to provide you with the best online experience. By clicking "Accept All," you help us understand how our site is used and enhance its performance. You can change your choice at any time here. To learn more, please visit our Privacy Policy.