2573 Publications

Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models

Frederik Kunstner, Robin Yadav, Alan Milligan, Mark Schmidt, A. Bietti

Adam has been shown to outperform gradient descent on large language models by a larger margin than on other tasks, but it is unclear why. We show that a key factor in this performance gap is the heavy-tailed class imbalance found in language tasks. When trained with gradient descent, the loss of infrequent words decreases more slowly than the loss of frequent ones. This leads to a slow decrease on the average loss as most samples come from infrequent words. On the other hand, Adam and sign-based methods are less sensitive to this problem. To establish that this behavior is caused by class imbalance, we show empirically that it can be reproduced across architectures and data types, on language transformers, vision CNNs, and linear models. On a linear model with cross-entropy loss, we show that class imbalance leads to imbalanced, correlated gradients and Hessians that have been hypothesized to benefit Adam. We also prove that, in continuous time, gradient descent converges slowly on low-frequency classes while sign descent does not.

Show Abstract

Context-invariant beliefs are supported by dynamic reconfiguration of single unit functional connectivity in prefrontal cortex of male macaques

Jean-Paul Noel, E. Balzani, Cristina Savin, D. Angelaki

Natural behaviors occur in closed action-perception loops and are supported by dynamic and flexible beliefs abstracted away from our immediate sensory milieu. How this real-world flexibility is instantiated in neural circuits remains unknown. Here, we have male macaques navigate in a virtual environment by primarily leveraging sensory (optic flow) signals, or by more heavily relying on acquired internal models. We record single-unit spiking activity simultaneously from the dorsomedial superior temporal area (MSTd), parietal area 7a, and the dorso-lateral prefrontal cortex (dlPFC). Results show that while animals were able to maintain adaptive task-relevant beliefs regardless of sensory context, the fine-grain statistical dependencies between neurons, particularly in 7a and dlPFC, dynamically remapped with the changing computational demands. In dlPFC, but not 7a, destroying these statistical dependencies abolished the area’s ability for cross-context decoding. Lastly, correlational analyses suggested that the more unit-to-unit couplings remapped in dlPFC, and the less they did so in MSTd, the less were population codes and behavior impacted by the loss of sensory evidence. We conclude that dynamic functional connectivity between neurons in prefrontal cortex maintain a stable population code and context-invariant beliefs during naturalistic behavior.

Show Abstract

E. coli do not count single molecules

H. Mattingly, Keita Kamino, Jude Ong, et al.

Organisms must perform sensory-motor behaviors to survive. What bounds or constraints limit behavioral performance? Previously, we found that the gradient-climbing speed of a chemotaxing Escherichia coli is near a bound set by the limited information they acquire from their chemical environments (1). Here we ask what limits their sensory accuracy. Past theoretical analyses have shown that the stochasticity of single molecule arrivals sets a fundamental limit on the precision of chemical sensing (2). Although it has been argued that bacteria approach this limit, direct evidence is lacking. Here, using information theory and quantitative experiments, we find that E. coli’s chemosensing is not limited by the physics of particle counting. First, we derive the physical limit on the behaviorally-relevant information that any sensor can get about a changing chemical concentration, assuming that every molecule arriving at the sensor is recorded. Then, we derive and measure how much information E. coli’s signaling pathway encodes during chemotaxis. We find that E. coli encode two orders of magnitude less information than an ideal sensor limited only by shot noise in particle arrivals. These results strongly suggest that constraints other than particle arrival noise limit E. coli’s sensory fidelity.

Show Abstract
July 9, 2024

Yardangs sculpted by erosion of heterogeneous material

Samuel Boury, S. Weady, Leif Ristroph, et. al.

The recognizable shapes of landforms arise from processes such as erosion by wind or water currents. However, explaining the physical origin of natural structures is challenging due to the coupled evolution of complex flow fields and three-dimensional (3D) topographies. We investigate these issues in a laboratory setting inspired by yardangs, which are raised, elongate formations whose characteristic shape suggests erosion of heterogeneous material by directional flows. We combine experiments and simulations to test an origin hypothesis involving a harder or less erodible inclusion embedded in an outcropping of softer material. Optical scans of clay objects fixed within flowing water reveal a transformation from a featureless mound to a yardang-like form resembling a lion in repose. Phase-field simulations reproduce similar shape dynamics and show their dependence on the erodibility contrast and flow strength. Through visualizations of the flow fields and analysis of the local erosion rate, we identify effects associated with flow funneling and the turbulent wake that are responsible for carving the unique geometrical features. This highly 3D scouring process produces complex shapes from simple and commonplace starting conditions and is thus a candidate explanation for natural yardangs. The methods introduced here should be generally useful for geomorphological problems and especially those for which material heterogeneity is a primary factor.

Show Abstract

Delayed rejection Hamiltonian Monte Carlo for sampling multiscale distributions

The efficiency of Hamiltonian Monte Carlo (HMC) can suffer when sampling a distribution with a wide range of length scales, because the small step sizes needed for stability in high-curvature regions are inefficient elsewhere. To address this we present a delayed rejection (DR) variant: if an initial HMC trajectory is rejected, we make one or more subsequent proposals each using a step size geometrically smaller than the last. To reduce the cost of DR approaches, we extend the standard delayed rejection to a probabilistic framework wherein we do not make multiple proposals at every rejection, but allow the probability of a retry to depend on the probability of accepting the previous proposal. We test the scheme in several sampling tasks, including statistical applications and multiscale model distributions such as Neal’s funnel. Delayed rejection enables sampling multiscale distributions for which standard approaches such as HMC fail to explore the tails, and improves performance five-fold over optimally-tuned HMC as measured by effective sample size per gradient evaluation. Even for simpler distributions, delayed rejection provides increased robustness to step size misspecification.

Show Abstract

posteriordb: Testing, Benchmarking and Developing Bayesian Inference Algorithms

Måns Magnusson, Jakob Torgander, Paul-Christian Bürkner, Lu Zhang, B. Carpenter, Aki Vehtari

The generality and robustness of inference algorithms is critical to the success of widely used probabilistic programming languages such as Stan, PyMC, Pyro, and this http URL. When designing a new general-purpose inference algorithm, whether it involves Monte Carlo sampling or variational approximation, the fundamental problem arises in evaluating its accuracy and efficiency across a range of representative target models. To solve this problem, we propose posteriordb, a database of models and data sets defining target densities along with reference Monte Carlo draws. We further provide a guide to the best practices in using posteriordb for model evaluation and comparison. To provide a wide range of realistic target densities, posteriordb currently comprises 120 representative models and has been instrumental in developing several general inference algorithms.

Show Abstract

Good Rates From Bad Coordinates: The Exponential Average Time-dependent Rate Approach

Nicodemo Mazzaferro, Subarna Sasmal, P. Cossio, Glen M. Hocky

Our ability to calculate rate constants of biochemical processes using molecular dynamics simulations is severely limited by the fact that the time scales for reactions, or changes in conformational state, scale exponentially with the relevant free-energy barrier heights. In this work, we improve upon a recently proposed rate estimator that allows us to predict transition times with molecular dynamics simulations biased to rapidly explore one or several collective variables (CVs). This approach relies on the idea that not all bias goes into promoting transitions, and along with the rate, it estimates a concomitant scale factor for the bias termed the “CV biasing efficiency”γ. First, we demonstrate mathematically that our new formulation allows us to derive the commonly used Infrequent Metadynamics (iMetaD) estimator when using a perfect CV, where γ= 1. After testing it on a model potential, we then study the unfolding behavior of a previously well characterized coarse-grained protein, which is sufficiently complex that we can choose many different CVs to bias, but which is sufficiently simple that we are able to compute the unbiased rate directly. For this system, we demonstrate that predictions from our new Exponential Average Time-Dependent Rate (EATR) estimator converge to the true rate constant more rapidly as a function of bias deposition time than does the previous iMetaD approach, even for bias deposition times that are short. We also show that the γparameter can serve as a good metric for assessing the quality of the biasing coordinate. We demonstrate that these results hold when applying the methods to an atomistic protein folding example. Finally, we demonstrate that our approach works when combining multiple less-than-optimal bias coordinates, and adapt our method to the related “OPES flooding”approach. Overall, our time-dependent rate approach offers a powerful framework for predicting rate constants from biased simulations.

Show Abstract

Fishing for Planets: A Comparative Analysis of EPRV Survey Performance in the Presence of Correlated Noise

A. Gupta, M. Bedell

With dedicated exoplanet surveys underway for multiple extreme-precision radial velocity (EPRV) instruments, the near-future prospects of RV exoplanet science are promising. These surveys' generous time allocations are expected to facilitate the discovery of Earth analogs around bright, nearby Sun-like stars. But survey success will depend critically on the choice of observing strategy, which will determine the survey's ability to mitigate known sources of noise and extract low-amplitude exoplanet signals. Here we present an analysis of the Fisher information content of simulated EPRV surveys, accounting for the most recent advances in our understanding of stellar variability on both short and long timescales (i.e., oscillations and granulation within individual nights, and activity-induced variations across multiple nights). In this analysis, we capture the correlated nature of stellar variability by parameterizing these signals with Gaussian process kernels. We describe the underlying simulation framework and the physical interpretation of the Fisher information content, and we evaluate the efficacy of EPRV survey strategies that have been presented in the literature. We explore and compare strategies for scheduling observations over various timescales, and we make recommendations to optimize survey performance for the detection of Earth-like exoplanets.

Show Abstract

Open Data In Neurophysiology: Advancements, Solutions & Challenges

Colleen J Gillon, Cody Baker, Ryan Ly, E. Balzani, Bingni W Brunton, Manuel Schottdorf, Satrajit Ghosh, Noma Dehghani

Across the life sciences, an ongoing effort over the last 50 years has made data and methods more reproducible and transparent. This openness has led to transformative insights and vastly accelerated scientific progress(1,2). For example, structural biology(3) and genomics(4,5) have undertaken systematic collection and publication of protein sequences and structures over the past half-century, and these data have led to scientific breakthroughs that were unthinkable when data collection first began (e.g.(6)). We believe that neuroscience is poised to follow the same path, and that principles of open data and open science will transform our understanding of the nervous system in ways that are impossible to predict at the moment. To this end, new social structures along with active and open scientific communities are essential(7) to facilitate and expand the still limited adoption of open science practices in our field(8). Unified by shared values of openness, we set out to organize a symposium for Open Data in Neuroscience (ODIN) to strengthen our community and facilitate transformative neuroscience research at large. In this report, we share what we learned during this first ODIN event. We also lay out plans for how to grow this movement, document emerging conversations, and propose a path toward a better and more transparent science of tomorrow.

Show Abstract

Magnetic, charge, and bond order in the two-dimensional Su-Schrieffer-Heeger-Holstein model

Most nonperturbative numerical studies of electron-phonon interactions focus on model Hamiltonians where the electrons interact with a phonon branch via a single type of microscopic mechanism. Two commonly explored couplings in this context are the Holstein and Su-Schrieffer-Heeger (SSH) interactions, which describe phonons modulating the on-site energy and intersite electron hopping, respectively. Many materials, however, have multiple phonon branches that can each interact with electronic degrees of freedom in different ways. We present here a determinant quantum Monte Carlo study of the half-filled two-dimensional (bond) SSH-Holstein Hamiltonian, where electrons couple to different phonon branches via either the Holstein or SSH mechanism. We map the model's phase diagram and determine the nature of the transitions between charge-density wave, bond-order wave, and antiferromagnetic order.
Show Abstract
July 1, 2024
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates

privacy consent banner

Privacy preference

We use cookies to provide you with the best online experience. By clicking "Accept All," you help us understand how our site is used and enhance its performance. You can change your choice at any time here. To learn more, please visit our Privacy Policy.