Publications

Spin-Flip Unitary Coupled Cluster Method: Toward Accurate Description of Strong Electron Correlation on Quantum Computers

Quantum computers have emerged as a promising platform to simulate the strong electron correlation that is crucial to catalysis and photochemistry. However, owing to the choice of a trial wave function employed in the popular hybrid quantum-classical variational quantum eigensolver (VQE) algorithm, the accurate simulation is restricted to certain classes of correlated phenomena. Herein, we combine the spin-flip (SF) formalism with the unitary coupled cluster with singles and doubles (UCCSD) method via the quantum equation-of-motion (qEOM) approach to allow for an efficient simulation of a large family of strongly correlated problems. In particular, we show that the developed qEOM-SF-UCCSD/VQE method outperforms its UCCSD/VQE counterpart for simulation of the cis-trans isomerization of ethylene and the automerization of cyclobutadiene. The predicted qEOM-SF-UCCSD/VQE barrier heights for these two problems are in a good agreement with the experimentally determined values. The methodological developments presented herein will further stimulate investigation of this approach for the simulation of other types of correlated/entangled phenomena on a quantum computer.

Show Abstract

On Single Index Models beyond Gaussian Data

L. Pillaud-Vivien, Joan Bruna, Ph.D., Aaron Zweig, Ph.D.

Sparse high-dimensional functions have arisen as a rich framework to study the behavior of gradient-descent methods using shallow neural networks, showcasing their ability to perform feature learning beyond linear models. Amongst those functions, the simplest are single-index models $f(x) = \phi( x \cdot \theta^*)$, where the labels are generated by an arbitrary non-linear scalar link function $\phi$ applied to an unknown one-dimensional projection $\theta^*$ of the input data. By focusing on Gaussian data, several recent works have built a remarkable picture, where the so-called information exponent (related to the regularity of the link function) controls the required sample complexity. In essence, these tools exploit the stability and spherical symmetry of Gaussian distributions. In this work, building from the framework of \cite{arous2020online}, we explore extensions of this picture beyond the Gaussian setting, where both stability or symmetry might be violated. Focusing on the planted setting where $\phi$ is known, our main results establish that Stochastic Gradient Descent can efficiently recover the unknown direction $\theta^*$ in the high-dimensional regime, under assumptions that extend previous works \cite{yehudai2020learning,wu2022learning}.

Show Abstract

Generative modeling via tensor train sketching

Yoonhaeng Hur, Jeremy G. Hoskins, Michael Lindsey, M. Stoudenmire, Yuehaw Khoo

In this paper, we introduce a sketching algorithm for constructing a tensor train representation of a probability density from its samples. Our method deviates from the standard recursive SVD-based procedure for constructing a tensor train. Instead, we formulate and solve a sequence of small linear systems for the individual tensor train cores. This approach can avoid the curse of dimensionality that threatens both the algorithmic and sample complexities of the recovery problem. Specifically, for Markov models under natural conditions, we prove that the tensor cores can be recovered with a sample complexity that scales logarithmically in the dimensionality. Finally, we illustrate the performance of the method with several numerical experiments.

Show Abstract

Mapping disease regulatory circuits at cell-type resolution from single-cell multiomics data

X. Chen, O. Troyanskaya, et al.

Resolving chromatin-remodeling-linked gene expression changes at cell-type resolution is important for understanding disease states. Here we describe MAGICAL (Multiome Accessibility Gene Integration Calling and Looping), a hierarchical Bayesian approach that leverages paired single-cell RNA sequencing and single-cell transposase-accessible chromatin sequencing from different conditions to map disease-associated transcription factors, chromatin sites, and genes as regulatory circuits. By simultaneously modeling signal variation across cells and conditions in both omics data types, MAGICAL achieved high accuracy on circuit inference. We applied MAGICAL to study Staphylococcus aureus sepsis from peripheral blood mononuclear single-cell data that we generated from subjects with bloodstream infection and uninfected controls. MAGICAL identified sepsis-associated regulatory circuits predominantly in CD14 monocytes, known to be activated by bacterial sepsis. We addressed the challenging problem of distinguishing host regulatory circuit responses to methicillin-resistant and methicillin-susceptible S. aureus infections. Although differential expression analysis failed to show predictive value, MAGICAL identified epigenetic circuit biomarkers that distinguished methicillin-resistant from methicillin-susceptible S. aureus infections.

Show Abstract

SGD with Large Step Sizes Learns Sparse Features

Maksym Andriushchenko, Aditya Varre, L. Pillaud-Vivien, Nicolas Flammarion

We showcase important features of the dynamics of the Stochastic Gradient Descent (SGD) in the training of neural networks. We present empirical observations that commonly used large step sizes (i) may lead the iterates to jump from one side of a valley to the other causing loss stabilization, and (ii) this stabilization induces a hidden stochastic dynamics that biases it implicitly toward simple predictors. Furthermore, we show empirically that the longer large step sizes keep SGD high in the loss landscape valleys, the better the implicit regularization can operate and find sparse representations. Notably, no explicit regularization is used: the regularization effect comes solely from the SGD dynamics influenced by the large step sizes schedule. Therefore, these observations unveil how, through the step size schedules, both gradient and noise drive together the SGD dynamics through the loss landscape of neural networks. We justify these findings theoretically through the study of simple neural network models as well as qualitative arguments inspired from stochastic processes. This analysis allows us to shed new light on some common practices and observed phenomena when training deep networks. The code of our experiments is available at https://github.com/tml-epfl/sgd-sparse-features.

Show Abstract

A Model-Based Method for Minimizing CVaR and Beyond

Si Yi Meng, R. M. Gower

We develop a variant of the stochastic prox-linear method for minimizing the Conditional Value-at-Risk (CVaR) objective. CVaR is a risk measure focused on minimizing worst-case performance, defined as the average of the top quantile of the losses. In machine learning, such a risk measure is useful to train more robust models. Although the stochastic subgradient method (SGM) is a natural choice for minimizing the CVaR objective, we show that our stochastic prox-linear (SPL+) algorithm can better exploit the structure of the objective, while still providing a convenient closed form update. Our SPL+ method also adapts to the scaling of the loss function, which allows for easier tuning. We then specialize a general convergence theorem for SPL+ to our setting, and show that it allows for a wider selection of step sizes compared to SGM. We support this theoretical finding experimentally.

Show Abstract

BridgeStan: Efficient in-memory access to the methods of a Stan model

Edward A. Roualdes, B. Ward, B. Carpenter, Adrian Seyboldt , Seth D. Axen

Stan provides a probabilistic programming language in which users can code Bayesian models (Carpenter et al., 2017; Stan Development Team, 2022). A Stan program is transpiled to to a C++ class that links to the Stan math library to implement smooth, unconstrained posterior log densities, gradients, and Hessians as well as constraining/unconstraining transforms. Implementation is provided through automatic differentiation in the Stan math library (Carpenter et al., 2015). BridgeStan provides in-memory access to the methods of Stan models through Python, Julia, R, and Rust. This allows algorithm development in these languages with the numerical efficiency and expressiveness of Stan models. Furthermore, these features are exposed through a language agnostic C API, allowing foreign function interfaces in other languages to utilize BridgeStan with minimal additional development.

Show Abstract

Maintaining symmetry during body axis elongation

Celia M. Smits, Sayantan Dutta, Vishank Jain-Sharma, Sebastian J. Streichan, S. Shvartsman

Bilateral symmetry defines much of the animal kingdom and is crucial for numerous functions of bilaterian organisms. Genetic approaches have discovered highly conserved patterning networks that establish bilateral symmetry in early embryos,1 but how this symmetry is maintained throughout subsequent morphogenetic events remains largely unknown.2 Here we show that the terminal patterning system—which relies on Ras/ERK signaling through activation of the Torso receptor by its ligand Trunk3—is critical for preserving bilateral symmetry during Drosophila body axis elongation, a process driven by cell rearrangements in the two identical lateral regions of the embryo and specified by the dorsal-ventral and anterior-posterior patterning systems.4 We demonstrate that fluctuating asymmetries in this rapid convergent-extension process are attenuated in normal embryos over time, possibly through noise-dissipating forces from the posterior midgut invagination and movement. However, when Torso signaling is attenuated via mutation of Trunk or RNAi directed against downstream Ras/ERK pathway components, body axis elongation results in a characteristic corkscrew phenotype,5 which reflects dramatic reorganization of global tissue flow and is incompatible with viability. Our results reveal a new function downstream of the Drosophila terminal patterning system in potentially active control of bilateral symmetry and should motivate systematic search for similar symmetry-preserving regulatory mechanisms in other bilaterians.

Show Abstract

Unlocking the Potential of Similarity Matching: Scalability, Supervision and Pre-training

Y. Bahroun, Shagesh Sridharan, Atithi Acharya, D. Chklovskii, A. Sengupta

While effective, the backpropagation (BP) algorithm exhibits limitations in terms of biological plausibility, computational cost, and suitability for online learning. As a result, there has been a growing interest in developing alternative biologically plausible learning approaches that rely on local learning rules. This study focuses on the primarily unsupervised similarity matching (SM) framework, which aligns with observed mechanisms in biological systems and offers online, localized, and biologically plausible algorithms. i) To scale SM to large datasets, we propose an implementation of Convolutional Nonnegative SM using PyTorch. ii) We introduce a localized supervised SM objective reminiscent of canonical correlation analysis, facilitating stacking SM layers. iii) We leverage the PyTorch implementation for pre-training architectures such as LeNet and compare the evaluation of features against BP-trained models. This work combines biologically plausible algorithms with computational efficiency opening multiple avenues for further explorations.

Show Abstract

Kernelized Diffusion Maps

L. Pillaud-Vivien, Francis Bach, Ph.D.

Spectral clustering and diffusion maps are celebrated dimensionality reduction algorithms built on eigen-elements related to the diffusive structure of the data. The core of these procedures is the approximation of a Laplacian through a graph kernel approach, however this local average construction is known to be cursed by the high-dimension 𝑑. In this article, we build a different estimator of the Laplacian, via a reproducing kernel Hilbert spaces method, which adapts naturally to the regularity of the problem. We provide non-asymptotic statistical rates proving that the kernel estimator we build can circumvent the curse of dimensionality. Finally we discuss techniques (Nyström subsampling, Fourier features) that enable to reduce the computational cost of the estimator while not degrading its overall performance.

Show Abstract