2697 Publications

GSM-Agent: Understanding Agentic Reasoning Using Controllable Environments

Hanlin Zhu, Tianyu Guo, Song Mei, Stuart Russell, N. Ghosh, A. Bietti, Jiantao Jiao

As LLMs are increasingly deployed as agents, agentic reasoning - the ability to combine tool use, especially search, and reasoning - becomes a critical skill. However, it is hard to disentangle agentic reasoning when evaluated in complex environments and tasks. Current agent benchmarks often mix agentic reasoning with challenging math reasoning, expert-level knowledge, and other advanced capabilities. To fill this gap, we build a novel benchmark, GSM-Agent, where an LLM agent is required to solve grade-school-level reasoning problems, but is only presented with the question in the prompt without the premises that contain the necessary information to solve the task, and needs to proactively collect that information using tools. Although the original tasks are grade-school math problems, we observe that even frontier models like GPT-5 only achieve 67% accuracy. To understand and analyze the agentic reasoning patterns, we propose the concept of agentic reasoning graph: cluster the environment's document embeddings into nodes, and map each tool call to its nearest node to build a reasoning path. Surprisingly, we identify that the ability to revisit a previously visited node, widely taken as a crucial pattern in static reasoning, is often missing for agentic reasoning for many models. Based on the insight, we propose a tool-augmented test-time scaling method to improve LLM's agentic reasoning performance by adding tools to encourage models to revisit. We expect our benchmark and the agentic reasoning framework to aid future studies of understanding and pushing the boundaries of agentic reasoning.

Show Abstract

A domain decomposition method for computing the scattering matrix of waveguide circuits

Tristan Goodwill, S. Jiang, M. Rachh, Kosuke Sugita

We analyze and develop numerical methods for time-harmonic wave scattering in metallic waveguide structures of infinite extent. We show that radiation boundary conditions formulated via projectors onto outgoing modes determine the coefficients of propagating modes uniquely, even when the structure supports trapped modes. Building on this, we introduce a fast divide-and-conquer solver that constructs solution operators on subdomains as impedance-to-impedance maps and couples them by enforcing continuity conditions across their interfaces. For Dirichlet waveguides, the computation of impedance-to-impedance maps requires the solution of mixed Dirichlet-Impedance boundary value problems. We construct a second-kind Fredholm integral equation that avoids near-hypersingular operators, requiring only integral operators whose kernels are at most weakly singular. Numerical experiments on large structures with many circuit elements demonstrate substantial efficiency gains: the proposed approach typically outperforms state-of-the-art fast iterative and fast direct solvers by one to two orders of magnitude.

Show Abstract

Fast summation of Stokes potentials using a new kernel-splitting in the DMK framework

Ludvig af Klinteberg, L. Greengard, S. Jiang, Anna-Karin Tornberg

Classical Ewald methods for Coulomb and Stokes interactions rely on ``kernel-splitting," using decompositions based on Gaussians to divide the resulting potential into a near field and a far field component. Here, we show that a more efficient splitting for the scalar biharmonic Green's function can be derived using zeroth-order prolate spheroidal wave functions (PSWFs), which in turn yields new efficient splittings for the Stokeslet, stresslet, and elastic kernels, since these Green's tensors can all be derived from the biharmonic kernel. This benefits all fast summation methods based on kernel splitting, including FFT-based Ewald summation methods, that are suitable for uniform point distributions, and DMK-based methods that allow for nonuniform point distributions. The DMK (dual-space multilevel kernel-splitting) algorithm we develop here is fast, adaptive, and linear-scaling, both in free space and in a periodic cube. We demonstrate its performance with numerical examples in two and three dimensions.

Show Abstract

The Wnt co-receptor Arrow-LRP5/6 is required for Planar Cell Polarity establishment in Drosophila

Ursula Weber, R. Farhadifar, Marek Mlodzik

Wnt-signaling, via β-catenin or the planar cell polarity (PCP) branch, is crucial for development, tissue homeostasis, and linked to many diseases. LRP5/6, arrow (arr) in Drosophila, is the obligate co-receptor in Wnt/β-catenin signaling, with ligand binding to a Frizzled (Fz) family member and LRP5/6 mediating formation of the signalosome complex with Dishevelled (Dsh/Dvl in mammals) and Axin. Current models for Wnt/PCP signaling omit Arr/LRP5/6 and the notion is that it functions without these co-receptors. Here we show that arr/LRP5/6 is positively required in Wnt/PCP signaling. In Drosophila, loss of arr results in PCP mediated cellular orientation defects, aberrant wing hair formation, and loss of polarity, as described for core PCP factors fz, fmi/Celsr, and dsh. In the eye, arr mutant tissue displays cell fate changes in photoreceptors R3/R4 and chirality defects, classical PCP phenotypes. During Wnt/PCP establishment, defects are manifest as reduced levels of Fmi/Celsr and Dsh along with loss of their asymmetric localization. Functional interactions indicate that Fz can recruit Arr, and this potentiates Fz and Dsh function in PCP signaling in all tissues tested. Taken together, our data support an essential Arr/LRP5/6 function in promoting Wnt/Fz-Dsh PCP-complex activity.

Show Abstract
September 22, 2025

Cryo-EM images are intrinsically low dimensional

L. Evans, Octavian-Vlad Murad, P. Cossio, et al.

Simulation-based inference provides a powerful framework for cryoelectron microscopy, employing neural networks in methods like CryoSBI to infer biomolecular conformations via learned latent representations. This latent space represents a rich opportunity, encoding valuable information about the physical system and the inference process. Harnessing this potential hinges on understanding the underlying geometric structure of these representations. We investigate this structure by applying manifold learning techniques to CryoSBI representations of a simulated benchmark dataset and both simulated and experimental images of hemagglutinin. We reveal that these high-dimensional data inherently populate low-dimensional, smooth manifolds, with simulated data effectively covering the experimental counterpart. By characterizing the manifold's geometry using Diffusion Maps and identifying its principal axes of variation via coordinate interpretation methods, we establish a direct link between the latent structure and key physical parameters. Discovering this intrinsic low-dimensionality and interpretable geometric organization not only validates the CryoSBI approach but also enables us to learn more from the data structure and provides opportunities for improving future inference strategies by exploiting this revealed manifold geometry.

Show Abstract
September 22, 2025

Towards Seamless Interoperability of MPI-OpenMP Applications

B. Smith, M. Berger, Junchao Zhang, Hui Zhou

A chasm exists between mathematical software libraries written for MPI-based applications and those written for OpenMP applications. Recently, however, PETSc enables the simple use of its MPI-based linear solvers from OpenMP applications. Separately, the MPICH MPI development team has started a new project to allow almost seamless MPI use in OpenMP applications. Both proposed approaches would result in a similar user experience. We discuss the reasons for these projects and their potential for providing more numerical library choices for OpenMP applications, including the unlimited assortment of linear solvers available in PETSc. In addition, we present the performance of an application using the first approach, demonstrating its efficacy.

Show Abstract

Predicting partially observable dynamical systems via diffusion models with a multiscale inference scheme

R. Morel, Francesco Pio Ramunno, Jeff Shen, A. Bietti, K. Cho, M. Cranmer, S. Golkar, Olexandr Gugnin , G. Krawezik, Et al.

Conditional diffusion models provide a natural framework for probabilistic prediction of dynamical systems and have been successfully applied to fluid dynamics and weather prediction. However, in many settings, the available information at a given time represents only a small fraction of what is needed to predict future states, either due to measurement uncertainty or because only a small fraction of the state can be observed. This is true for example in solar physics, where we can observe the Sun’s surface and atmosphere, but its evolution is driven by internal processes for which we lack direct measurements. In this paper, we tackle the probabilistic prediction of partially observable, long-memory dynamical systems, with applications to solar dynamics and the evolution of active regions. We show that standard inference schemes, such as autoregressive rollouts, fail to capture long-range dependencies in the data, largely because they do not integrate past information effectively. To overcome this, we propose a multiscale inference scheme for diffusion models, tailored to physical processes. Our method generates trajectories that are temporally fine-grained near the present and coarser as we move farther away, which enables capturing long-range temporal dependencies without increasing computational cost. When integrated into a diffusion model, we show that our inference scheme significantly reduces the bias of the predicted distributions and improves rollout stability.

Show Abstract

AION-1: Omnimodal Foundation Model for Astronomical Sciences

L. Parker, F. Lanusse, Jeff Shen, Ollie Liu, Tom Hehir, L. Sarra, Lucas Meyer, Micah Bowles, S. Wagner-Carena, H. Qu, S. Golkar, A. Bietti, R. Morel, Et al.

While foundation models have shown promise across a variety of fields, astronomy lacks a unified framework for joint modeling across its highly diverse data modalities. In this paper, we present AION-1, the first large-scale multimodal foundation family of models for astronomy. AION-1 enables arbitrary transformations between heterogeneous data types using a two-stage architecture: modality-specific tokenization followed by transformer-based masked modeling of cross-modal token sequences. Trained on over 200M astronomical objects, AION-1 demonstrates strong performance across regression, classification, generation, and object retrieval tasks. Beyond astronomy, AION-1 provides a scalable blueprint for multimodal scientific foundation models that can seamlessly integrate heterogeneous combinations of real-world observations. Our model release is entirely open source, including the dataset, training script, and weights.

Show Abstract

ArchVelo: Archetypal Velocity Modeling for Single-cell Multi-omic Trajectories

M. Avdeeva, Sarah Walker, et al.

nferring dynamic cellular processes from static single-cell measurements remains a central challenge in genomics. Here we introduce ArchVelo, a new method for modeling gene regulation and inferring cell trajectories using single-cell simultaneous chromatin accessibility (scATAC-seq) and transcriptomic (scRNA-seq) profiling. ArchVelo represents chromatin accessibility as a set of archetypes—shared regulatory programs—and models their dynamic influence on transcription. Compared to previous methods, ArchVelo improves inference accuracy and gene-level latent time alignment, and enables identification of the underlying transcription factor activity. We benchmark ArchVelo on developing mouse brain and human hematopoiesis datasets and apply it to CD8 T cells responding to viral infection, revealing distinct trajectories of differentiation and proliferation. Focusing on the progenitor CD8 T cell population with key roles in sustaining immune responses and translationally linked to immunotherapy outcomes, we identify a previously uncharacterized differentiation trajectory from Ccr6− to Ccr6+ progenitors, shared between acute and chronic infection. In sum, ArchVelo provides a principled framework for modeling dynamic gene regulation in multi-omic single-cell data across biological systems.

Show Abstract
September 17, 2025

Coherent dynamics of thalamic head-direction neurons irrespective of input

G. Viejo, Sofia Skromne Carrasco, Adrien Peyrache

While the thalamus is known to relay and modulate sensory signals to the cortex, whether it also participates in active computation and intrinsic signal generation remains unresolved. The anterodorsal nucleus of the thalamus broadcasts the head-direction (HD) signal, which is generated in the brainstem, particularly in the upstream lateral mammillary nucleus, and thalamic HD cells remain coordinated even during sleep. Here, by recording and manipulating neuronal activity along the mammillary–thalamic–cortical pathway, we show that coherence among thalamic HD cells persists even when their upstream inputs are decorrelated, particularly during non-Rapid Eye Movement sleep. These findings suggest that thalamic circuits are sufficient to generate and maintain coherent population dynamics in the absence of structured input.

Show Abstract
September 16, 2025
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates