2697 Publications

SGD with Large Step Sizes Learns Sparse Features

Maksym Andriushchenko, Aditya Varre, L. Pillaud-Vivien, Nicolas Flammarion

We showcase important features of the dynamics of the Stochastic Gradient Descent (SGD) in the training of neural networks. We present empirical observations that commonly used large step sizes (i) may lead the iterates to jump from one side of a valley to the other causing loss stabilization, and (ii) this stabilization induces a hidden stochastic dynamics that biases it implicitly toward simple predictors. Furthermore, we show empirically that the longer large step sizes keep SGD high in the loss landscape valleys, the better the implicit regularization can operate and find sparse representations. Notably, no explicit regularization is used: the regularization effect comes solely from the SGD dynamics influenced by the large step sizes schedule. Therefore, these observations unveil how, through the step size schedules, both gradient and noise drive together the SGD dynamics through the loss landscape of neural networks. We justify these findings theoretically through the study of simple neural network models as well as qualitative arguments inspired from stochastic processes. This analysis allows us to shed new light on some common practices and observed phenomena when training deep networks. The code of our experiments is available at https://github.com/tml-epfl/sgd-sparse-features.

Show Abstract

A Model-Based Method for Minimizing CVaR and Beyond

Si Yi Meng, R. M. Gower

We develop a variant of the stochastic prox-linear method for minimizing the Conditional Value-at-Risk (CVaR) objective. CVaR is a risk measure focused on minimizing worst-case performance, defined as the average of the top quantile of the losses. In machine learning, such a risk measure is useful to train more robust models. Although the stochastic subgradient method (SGM) is a natural choice for minimizing the CVaR objective, we show that our stochastic prox-linear (SPL+) algorithm can better exploit the structure of the objective, while still providing a convenient closed form update. Our SPL+ method also adapts to the scaling of the loss function, which allows for easier tuning. We then specialize a general convergence theorem for SPL+ to our setting, and show that it allows for a wider selection of step sizes compared to SGM. We support this theoretical finding experimentally.

Show Abstract

BridgeStan: Efficient in-memory access to the methods of a Stan model

Edward A. Roualdes, B. Ward, B. Carpenter, Adrian Seyboldt , Seth D. Axen

Stan provides a probabilistic programming language in which users can code Bayesian models (Carpenter et al., 2017; Stan Development Team, 2022). A Stan program is transpiled to to a C++ class that links to the Stan math library to implement smooth, unconstrained posterior log densities, gradients, and Hessians as well as constraining/unconstraining transforms. Implementation is provided through automatic differentiation in the Stan math library (Carpenter et al., 2015). BridgeStan provides in-memory access to the methods of Stan models through Python, Julia, R, and Rust. This allows algorithm development in these languages with the numerical efficiency and expressiveness of Stan models. Furthermore, these features are exposed through a language agnostic C API, allowing foreign function interfaces in other languages to utilize BridgeStan with minimal additional development.

Show Abstract

Maintaining symmetry during body axis elongation

Celia M. Smits, Sayantan Dutta, Vishank Jain-Sharma, Sebastian J. Streichan, S. Shvartsman

Bilateral symmetry defines much of the animal kingdom and is crucial for numerous functions of bilaterian organisms. Genetic approaches have discovered highly conserved patterning networks that establish bilateral symmetry in early embryos,1 but how this symmetry is maintained throughout subsequent morphogenetic events remains largely unknown.2 Here we show that the terminal patterning system—which relies on Ras/ERK signaling through activation of the Torso receptor by its ligand Trunk3—is critical for preserving bilateral symmetry during Drosophila body axis elongation, a process driven by cell rearrangements in the two identical lateral regions of the embryo and specified by the dorsal-ventral and anterior-posterior patterning systems.4 We demonstrate that fluctuating asymmetries in this rapid convergent-extension process are attenuated in normal embryos over time, possibly through noise-dissipating forces from the posterior midgut invagination and movement. However, when Torso signaling is attenuated via mutation of Trunk or RNAi directed against downstream Ras/ERK pathway components, body axis elongation results in a characteristic corkscrew phenotype,5 which reflects dramatic reorganization of global tissue flow and is incompatible with viability. Our results reveal a new function downstream of the Drosophila terminal patterning system in potentially active control of bilateral symmetry and should motivate systematic search for similar symmetry-preserving regulatory mechanisms in other bilaterians.

Show Abstract

Unlocking the Potential of Similarity Matching: Scalability, Supervision and Pre-training

Y. Bahroun, Shagesh Sridharan, Atithi Acharya, D. Chklovskii, A. Sengupta

While effective, the backpropagation (BP) algorithm exhibits limitations in terms of biological plausibility, computational cost, and suitability for online learning. As a result, there has been a growing interest in developing alternative biologically plausible learning approaches that rely on local learning rules. This study focuses on the primarily unsupervised similarity matching (SM) framework, which aligns with observed mechanisms in biological systems and offers online, localized, and biologically plausible algorithms. i) To scale SM to large datasets, we propose an implementation of Convolutional Nonnegative SM using PyTorch. ii) We introduce a localized supervised SM objective reminiscent of canonical correlation analysis, facilitating stacking SM layers. iii) We leverage the PyTorch implementation for pre-training architectures such as LeNet and compare the evaluation of features against BP-trained models. This work combines biologically plausible algorithms with computational efficiency opening multiple avenues for further explorations.

Show Abstract

Kernelized Diffusion Maps

L. Pillaud-Vivien, Francis Bach, Ph.D.

Spectral clustering and diffusion maps are celebrated dimensionality reduction algorithms built on eigen-elements related to the diffusive structure of the data. The core of these procedures is the approximation of a Laplacian through a graph kernel approach, however this local average construction is known to be cursed by the high-dimension 𝑑. In this article, we build a different estimator of the Laplacian, via a reproducing kernel Hilbert spaces method, which adapts naturally to the regularity of the problem. We provide non-asymptotic statistical rates proving that the kernel estimator we build can circumvent the curse of dimensionality. Finally we discuss techniques (Nyström subsampling, Fourier features) that enable to reduce the computational cost of the estimator while not degrading its overall performance.

Show Abstract

A note about convected time derivatives for flows of complex fluids

Howard A Stone , M. Shelley, Evgeniy Boyko

We present a direct derivation of the typical time derivatives used in a continuum description of complex fluid flows{,} harnessing the principles of the kinematics of line elements. The evolution of the microstructural conformation tensor in a flow and the physical interpretation of different derivatives then follow naturally.

Show Abstract

Spin-valley magnetism on the triangular moiré lattice with SU(4) breaking interactions

D. Kiese, S. Trebst
The discovery of correlated insulating states in moiré heterostructures has renewed the interest in strongly-coupled electron systems where spin and valley (or layer) degrees of freedom are intertwined. In the strong-coupling limit, such systems can be effectively described by SU(4) spin-valley models akin to Kugel-Khomskii models long studied in the context of spin-orbit coupled materials. However, typical moiré heterostructures also exhibit interactions that break the SU(4) symmetry down to SU(2)
Show Abstract

Nonequilibrium correlation dynamics in the one-dimensional Fermi-Hubbard model: A testbed for the two-particle reduced density matrix theory

B. Kloss, A. Rubio
We explore the non-equilibrium dynamics of a one-dimensional Fermi-Hubbard system as a sensitive testbed for the capabilities of the time-dependent two-particle reduced density matrix (TD2RDM) theory to accurately describe time-dependent correlated systems. We follow the time evolution of the out-of-equilibrium finite-size Fermi-Hubbard model initialized by a quench over extended periods of time. By comparison with exact calculations for small systems and with matrix product state (MPS) calculations for larger systems but limited to short times, we demonstrate that the TD2RDM theory can accurately account for the non-equilibrium dynamics in the regime from weak to moderately strong inter-particle correlations. We find that the quality of the approximate reconstruction of the three-particle cumulant (or correlation) required for the closure of the equations of motion for the reduced density matrix is key to the accuracy of the numerical TD2RDM results. We identify the size of the dynamically induced three-particle correlations and the amplitude of cross correlations between the two- and three-particle cumulants as critical parameters that control the accuracy of the TD2RDM theory when current state-of-the art reconstruction functionals are employed.
Show Abstract

Many-body delocalization from embedded thermal inclusion

We numerically study quantum avalanches in 1D disordered spin systems by attaching two XXZ spin chains. One chain has low disorder representing a rare Griffith's region, or thermal inclusion, and the second has larger disorder, i.e. disorder larger than the observed finite-size crossover. Comparing dynamics of this system to identical systems with uniformly large disorder, we find evidence for exponentially slow thermalization (in disorder) within the MBL regime when the rare region is present. We observe a decay of the spin imbalance in the bulk of the large disorder region that persists to long times (
Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates