2596 Publications

Variational Inference for Uncertainty Quantification: an Analysis of Trade-offs

C. Margossian, L. Pillaud-Vivien, L. Saul

Given an intractable distribution p, the problem of variational inference (VI) is to find the best approximation from some more tractable family Q. Commonly, one chooses Q to be a family of factorized distributions (i.e., the mean-field assumption), even though p itself does not factorize. We show that this mismatch leads to an impossibility theorem: if p does not factorize, then any factorized approximation q∈Q can correctly estimate at most one of the following three measures of uncertainty: (i) the marginal variances, (ii) the marginal precisions, or (iii) the generalized variance (which can be related to the entropy). In practice, the best variational approximation in Q is found by minimizing some divergence D(q,p) between distributions, and so we ask: how does the choice of divergence determine which measure of uncertainty, if any, is correctly estimated by VI? We consider the classic Kullback-Leibler divergences, the more general Rényi divergences, and a score-based divergence which compares ∇logp and ∇logq. We provide a thorough theoretical analysis in the setting where p is a Gaussian and q is a (factorized) Gaussian. We show that all the considered divergences can be

Show Abstract

How Truncating Weights Improves Reasoning in Language Models

Lei Chen, Joan Bruna, A. Bietti

In addition to the ability to generate fluent text in various languages, large language models have been successful at tasks that involve basic forms of logical "reasoning" over their context. Recent work found that selectively removing certain components from weight matrices in pre-trained models can improve such reasoning capabilities. We investigate this phenomenon further by carefully studying how certain global associations tend to be stored in specific weight components or Transformer blocks, in particular feed-forward layers. Such associations may hurt predictions in reasoning tasks, and removing the corresponding components may then improve performance. We analyze how this arises during training, both empirically and theoretically, on a two-layer Transformer trained on a basic reasoning task with noise, a toy associative memory model, and on the Pythia family of pre-trained models tested on simple reasoning tasks.

Show Abstract

Dynamical correlation functions from complex time evolution

We present an approach to tame the growth of entanglement during time evolution by tensor network methods. It combines time evolution in the complex plane with a perturbative and controlled reconstruction of correlation functions on the real time axis. We benchmark our approach on the single impurity Anderson model. Compared to purely real time evolution, the complex time evolution significantly reduces the required bond dimension to obtain the spectral function. Notably, our approach yields self-energy results with high precision at low frequencies, comparable to numerical renormalization group results, and it successfully captures the exponentially small Kondo energy scale.

Show Abstract

Matrix Product Study of Spin Fractionalization in the 1D Kondo Insulator

J. Chen, M. Stoudenmire, Yashar Komijani, Piers Coleman

The Kondo lattice is one of the classic examples of strongly correlated electronic systems. We conduct a controlled study of the Kondo lattice in one dimension, highlighting the role of excitations created by the composite fermion operator. Using time-dependent matrix product state methods, we compute various correlation functions and contrast them with both large-N mean-field theory and the strong-coupling expansion. We show that the composite fermion operator creates long-lived, charge-e and spin-1/2 excitations, which cover the low-lying single-particle excitation spectrum of the system. Furthermore, spin excitations can be thought to be composed of such fractionalized quasiparticles with a residual interaction which tend to disappear at weak Kondo coupling.

Show Abstract

Martini without the twist: Unveiling a mechanically correct microtubule through bottom-up coarse-graining in Martini 3

Microtubules are essential cytoskeletal filaments involved in cell motility, division, and intracellular transport. These biomolecular assemblies can exhibit complex structural be-haviors influenced by various biophysical factors. However, simulating microtubule systems at the atomistic scale is challenging due to their large spatial scales. Here, we present an approach utilizing the Martini 3 Coarse-Grained (CG) model coupled with an appropriate elastic network to simulate microtubule-based systems accurately. By iteratively optimiz-ing the elastic network parameters, we matched the structural fluctuations of CG hetero-dimer building blocks to their atomistic counterparts. Our efforts culminated in a ∼ 200nm microtubule built with ∼ 6 million interaction-centers that could reproduce experimentally observed mechanical properties. Our aim is to employ these CG simulations to investigate specific biophysical phenomena at a microscopic level. These microscopic perspectives can provide valuable insights into the underlying mechanisms and contribute to our knowledge of microtubule-associated processes in cellular biology. With MARTINI 3 CG simulations, we can bridge the gap between computational efficiency and molecular detail, enabling in-vestigations into these biophysical processes over longer spatio-temporal scales with amino acid-level insights.

Show Abstract
June 1, 2024

Metabolic imaging of human cumulus cells reveals associations with pregnancy and live birth

M. Venturas, C Racowsky, D. Needleman

Can fluorescence lifetime imaging microscopy (FLIM) detect associations between the metabolic state of cumulus cell (CC) samples and the clinical outcome of the corresponding embryos?

FLIM can detect significant variations in the metabolism of CC associated with the corresponding embryos that resulted in a clinical pregnancy versus those that did not.

Show Abstract

Layerwise complexity-matched learning yields an improved model of cortical area {V2}

N. Parthasarathy, O J Henaff, E. P. Simoncelli

Human ability to recognize complex visual patterns arises through transformations performed by successive areas in the ventral visual cortex. Deep neural networks trained endto-end for object recognition approach human capabilities, and offer the best descriptions to date of neural responses in the late stages of the hierarchy. But these networks provide a poor account of the early stages, compared to traditional hand-engineered models, or models optimized for coding efficiency or prediction. Moreover, the gradient backpropagation used in end-to-end learning is generally considered to be biologically implausible. Here, we overcome both of these limitations by developing a bottom-up self-supervised training methodology that operates independently on successive layers. Specifically, we maximize feature similarity between pairs of locally-deformed natural image patches, while decorrelating features across patches sampled from other images. Crucially, the deformation amplitudes are adjusted proportionally to receptive field sizes in each layer, thus matching the task complexity to the capacity at each stage of processing. In comparison with architecturematched versions of previous models, we demonstrate that our layerwise complexity-matched learning (LCL) formulation produces a two-stage model (LCL-V2) that is better aligned with selectivity properties and neural activity in primate area V2. We demonstrate that the complexity-matched learning paradigm is responsible for much of the emergence of the improved biological alignment. Finally, when the two-stage model is used as a fixed front end for a deep network trained to perform object recognition, the resultant model (LCL-V2Net) is significantly better than standard end-to-end self-supervised, supervised, and adversarially-trained models in terms of generalization to out-of-distribution tasks and alignment with human behavior. Our code and pre-trained checkpoints are available at https://github.com/nikparth/LCL-V2.git

Show Abstract

Offline supervised learning vs online direct policy optimization: A comparative study and a unified training paradigm for neural network-based optimal feedback control

Yue Zhao, J. Han

This work is concerned with solving neural network-based feedback controllers efficiently for optimal control problems. We first conduct a comparative study of two prevalent approaches: offline supervised learning and online direct policy optimization. Albeit the training part of the supervised learning approach is relatively easy, the success of the method heavily depends on the optimal control dataset generated by open-loop optimal control solvers. In contrast, direct policy optimization turns the optimal control problem into an optimization problem directly without any requirement of pre-computing, but the dynamics-related objective can be hard to optimize when the problem is complicated. Our results underscore the superiority of offline supervised learning in terms of both optimality and training time. To overcome the main challenges, dataset and optimization, in the two approaches respectively, we complement them and propose the Pre-train and Fine-tune strategy as a unified training paradigm for optimal feedback control, which further improves the performance and robustness significantly. Our code is accessible at https://github.com/yzhao98/DeepOptimalControl.

Show Abstract

Sublattice Structure and Topology in Spontaneously Crystallized Electronic States

Y. Zeng, D. Guerci, V. Crépel, J. Cano
The prediction and realization of the quantum anomalous Hall effect are often intimately connected to honeycomb lattices in which the sublattice degree of freedom plays a central role in the nontrivial topology. Two-dimensional Wigner crystals, on the other hand, form triangular lattices without sublattice degrees of freedom, resulting in a topologically trivial state. In this Letter, we discuss the possibility of spontaneously formed honeycomb-lattice crystals that exhibit the quantum anomalous Hall effect. Starting from a single-band system with nontrivial quantum geometry, we derive the mean-field energy functional of a class of crystal states and express it as a model of sublattice pseudospins in momentum space. We find that nontrivial quantum geometry leads to extra terms in the pseudospin model that break an effective `time-reversal symmetry' and favor a topologically nontrivial pseudospin texture. When the effects of these extra terms dominate over the ferromagnetic exchange coupling between pseudospins, the anomalous Hall crystal state becomes energetically favorable over the trivial Wigner crystal state.
Show Abstract
June 1, 2024

High-harmonic spectroscopy of strongly bound excitons in solids

We explore the nonlinear response of ultrafast strong-field driven excitons in a one-dimensional solid with ab initio simulations. We demonstrate from our simulations and analytical model that a finite population of excitons imprints unique signatures to the high-harmonic spectra of materials. We show the exciton population can be retrieved from the spectra. We further demonstrate signatures of exciton recombination and that a shift of the exciton level is imprinted into the harmonic signal. The results open the door to high-harmonic spectroscopy of excitons in condensed-matter systems.
Show Abstract
June 1, 2024
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates

privacy consent banner

Privacy preference

We use cookies to provide you with the best online experience. By clicking "Accept All," you help us understand how our site is used and enhance its performance. You can change your choice at any time here. To learn more, please visit our Privacy Policy.