2697 Publications

Counterfactual Learning of Stochastic Policies with Continuous Actions

Houssam Zenati , A. Bietti, Matthieu Martin, Eustache Diemert, Julien Mairal

Counterfactual reasoning from logged data has become increasingly important for many applications such as web advertising or healthcare. In this paper, we address the problem of counterfactual risk minimization (CRM) for learning a stochastic policy with continuous actions, whereas most existing work has focused on the discrete setting. Switching from discrete to continuous action spaces presents several difficulties as naive discretization strategies have been shown to perform poorly. To deal with this issue, we first introduce an effective contextual modelling strategy that learns a joint representation of contexts and actions based on positive definite kernels. Second, we empirically show that the optimization perspective of CRM is more important than previously thought, and we demonstrate the benefits of proximal point algorithms and differentiable estimators. Finally, we propose an evaluation protocol for offline policies in real-world logged systems, which is challenging since policies cannot be replayed on test data, and we release a new large-scale dataset along with multiple synthetic, yet realistic, evaluation setups.

Show Abstract

The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models

A. Kirsanov, C. Chou , Kyunghyun Cho, S. Chung

Decoder-only language models have the ability to dynamically switch between various computational tasks based on input prompts. Despite many successful applications of prompting, there is very limited understanding of the internal mechanism behind such flexibility. In this work, we investigate how different prompting methods affect the geometry of representations in these models. Employing a framework grounded in statistical physics, we reveal that various prompting techniques, while achieving similar performance, operate through distinct representational mechanisms for task adaptation. Our analysis highlights the critical role of input distribution samples and label semantics in few-shot in-context learning. We also demonstrate evidence of synergistic and interfering interactions between different tasks on the representational level. Our work contributes to the theoretical understanding of large language models and lays the groundwork for developing more effective, representation-aware prompting strategies.

Show Abstract
February 11, 2025

Identifying new classes of financial price jumps with wavelets

Cecilia Aubrun, R. Morel, Michael Benzaquen, Jean-Philippe Bouchaud

We introduce an unsupervised classification framework that leverages a multiscale wavelet representation of time-series and apply it to stock price jumps. In line with previous work, we recover the fact that time-asymmetry of volatility is the major feature that separates exogenous, news-induced jumps from endogenously generated jumps. Local mean-reversion and trend are found to be two additional key features, allowing us to identify new classes of jumps. Using our wavelet-based representation, we investigate the endogenous or exogenous nature of cojumps, which occur when multiple stocks experience price jumps within the same minute. Perhaps surprisingly, our analysis suggests that a significant fraction of cojumps result from an endogenous contagion mechanism.

Show Abstract

Spatial Frequency Maps in Human Visual Cortex: A Replication and Extension

Jiyeong Ha, B. Broderick, Kendrick Kay, J. Winawer

In a step toward developing a model of human primary visual cortex, a recent study introduced a model of spatial frequency tuning in V1 (Broderick, Simoncelli, & Winawer, 2022). The model is compact, using just 9 parameters to predict BOLD response amplitude for locations across all of V1 as a function of stimulus orientation and spatial frequency. Here we replicated this analysis in a new dataset, the ‘nsdsynthetic’ supplement to the Natural Scenes Dataset (Allen et al., 2022), to assess generalization of model parameters. Furthermore, we extended the analyses to extrastriate maps V2 and V3. For each retinotopic map in the 8 NSD subjects, we fit the 9-parameter model. Despite many experimental differences between NSD and the original study, including stimulus size, experimental design, and MR field strength, there was good agreement in most model parameters. The dependence of preferred spatial frequency on eccentricity in V1 was similar between NSD and Broderick et al. Moreover, the effect of absolute stimulus orientation on spatial frequency maps was similar: higher preferred spatial frequency for horizontal and cardinal orientations compared to vertical and oblique orientations in both studies. The extension to extrastriate maps revealed that the biggest change in tuning between maps was in bandwidth: the bandwidth in spatial frequency tuning increased by 70% from V1 to V2 and 100% from V1 to V3, paralleling known increases in receptive field size. Together, the results show robust reproducibility and bring us closer to a systematic characterization of spatial encoding in the human visual system.

Show Abstract
February 5, 2025

Full minimal coupling Maxwell-TDDFT: An ab initio framework for light-matter interaction beyond the dipole approximation

We report the first ab initio, non-relativistic QED method that couples light and matter self-consistently beyond the electric dipole approximation and without multipolar truncations. This method is based on an extension of the Maxwell-Pauli-Kohn-Sham approach to a full minimal coupling Hamiltonian, where the space- and time-dependent vector potential is coupled to the matter system, and its back-reaction to the radiated fields is generated by the full current density. The implementation in the open-source Octopus code is designed for massively-parallel multiscale simulations considering different grid spacings for the Maxwell and matter subsystems. Here, we show the first applications of this framework to simulate renormalized Cherenkov radiation of an electronic wavepacket, magnetooptical effects with non-chiral light in non-chiral molecular systems, and renormalized plasmonic modes in a nanoplasmonic dimer. We show that in some cases the beyond-dipole effects can not be captured by a multipolar expansion Hamiltonian in the length gauge. Finally, we discuss further opportunities enabled by the framework in the field of twisted light and orbital angular momentum, inelastic light scattering and strong field physics.
Show Abstract
February 1, 2025

Scale dependencies and self-similar models with wavelet scattering spectra

R. Morel, Gaspar Rochette, Roberto Leonarduzzi, Jean Philippe Bouchaud, S. Mallat

Multi-scale non-Gaussian time-series having stationary increments appear in a wide range of applications, particularly in finance and physics. We introduce stochastic models that capture intermittency phenomena such as crises or bursts of activity, time reversal asymmetries, and that can be estimated from a single realization of size N. Variations at multiple scales are separated with a wavelet transform. Non-Gaussian properties appear through dependencies of wavelet coefficients across scales. We define maximum entropy models from the joint correlation across time and scales of wavelet coefficients and their modulus. Diagonal matrix approximations are estimated with a wavelet representation of this joint correlation. The resulting diagonals define O(log3⁡N) moments that are called scattering spectra. A notion of wide-sense self-similarity is defined from the invariance of scattering spectra to scaling, which can be tested numerically on a single realization. We study the accuracy of maximum entropy scattering spectra models for fractional Brownian motions, Hawkes processes, multifractal random walks, as well as financial and turbulent time-series.

Show Abstract

The No-Underrun Sampler: A Locally-Adaptive, Gradient-Free MCMC Method

N. Bou-Rabee, B. Carpenter, S. Liu, Stefan Oberdörster

In this work, we introduce the No-Underrun Sampler (NURS), a locally-adaptive, gradient-free Markov chain Monte Carlo method that blends ideas from Hit-and-Run and the No-U-Turn Sampler. NURS dynamically adapts to the local scale of the target distribution without requiring gradient evaluations, making it especially suitable for applications where gradients are unavailable or costly. We establish key theoretical properties, including reversibility, formal connections to Hit-and-Run and Random Walk Metropolis, Wasserstein contraction comparable to Hit-and-Run in Gaussian targets, and bounds on the total variation distance between the transition kernels of Hit-and-Run and NURS. Empirical experiments, supported by theoretical insights, illustrate the ability of NURS to sample from Neal's funnel, a challenging multi-scale distribution from Bayesian hierarchical inference.

Show Abstract

A fully adaptive, high-order, fast Poisson solver for complex two-dimensional geometries

We present a new framework for the fast solution of inhomogeneous elliptic boundary value problems in domains with smooth boundaries. High-order solvers based on adaptive box codes or the fast Fourier transform can efficiently treat the volumetric inhomogeneity, but require care to be taken near the boundary to ensure that the volume data is globally smooth. We avoid function extension or cut-cell quadratures near the boundary by dividing the domain into two regions: a bulk region away from the boundary that is efficiently treated with a truncated free-space box code, and a variable-width boundary-conforming strip region that is treated with a spectral collocation method and accompanying fast direct solver. Particular solutions in each region are then combined with Laplace layer potentials to yield the global solution. The resulting solver has an optimal computational complexity of O(N) for an adaptive discretization with N degrees of freedom. With an efficient two-dimensional (2D) implementation we demonstrate adaptive resolution of volumetric data, boundary data, and geometric features across a wide range of length scales, to typically 10-digit accuracy. The cost of all boundary corrections remains small relative to that of the bulk box code. The extension to 3D is expected to be straightforward in many cases because the strip

Show Abstract

Sub-cellular population imaging tools reveal stable apical dendrites in hippocampal area CA3

J. Moore, Shannon K. Rashid, Emmett Bicker, Cara D. Johnson, Naomi Codrington, D. Chklovskii, Jayeeta Basu

Apical and basal dendrites of pyramidal neurons receive anatomically and functionally distinct inputs, implying compartment-level functional diversity during behavior. To test this, we imaged in vivo calcium signals from soma, apical dendrites, and basal dendrites in mouse hippocampal CA3 pyramidal neurons during head-fixed navigation. To capture compartment-specific population dynamics, we developed computational tools to automatically segment dendrites and extract accurate fluorescence traces from densely labeled neurons. We validated the method on sparsely labeled preparations and synthetic data, predicting an optimal labeling density for high experimental throughput and analytical accuracy. Our method detected rapid, local dendritic activity. Dendrites showed robust spatial tuning, similar to soma but with higher activity rates. Across days, apical dendrites remained more stable and outperformed in decoding of the animal’s position. Thus, population-level apical and basal dendritic differences may reflect distinct compartment-specific input-output functions and computations in CA3. These tools will facilitate future studies mapping sub-cellular activity and their relation to behavior.

Show Abstract

Understanding Optimization in Deep Learning with Central Flows

J. Cohen, Alex Damian, Ameet Talwalkar, J Zico Kolter, Jason D. Lee

Optimization in deep learning remains poorly understood. A key difficulty is that optimizers exhibit complex oscillatory dynamics, referred to as "edge of stability," which cannot be captured by traditional optimization theory. In this paper, we show that the path taken by an oscillatory optimizer can often be captured by a central flow: a differential equation which directly models the time-averaged (i.e. smoothed) optimization trajectory. We empirically show that these central flows can predict long-term optimization trajectories for generic neural networks with a high degree of numerical accuracy. By interpreting these flows, we are able to understand how gradient descent makes progress even as the loss sometimes goes up; how adaptive optimizers ``adapt'' to the local loss landscape; and how adaptive optimizers implicitly seek out regions of weight space where they can take larger steps. These insights (and others) are not apparent from the optimizers' update rules, but are revealed by the central flows. Therefore, we believe that central flows constitute a promising tool for reasoning about optimization in deep learning.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates