2697 Publications

cryoJAX: A Cryo-electron Microscopy Image Simulation Library In JAX

Michael J. O'Brien, S. Hanson, D. Needleman, et al.

While cryo-electron microscopy (cryo-EM) has come to prominence in the last decade due to its ability to resolve biomolecular complexes at atomic resolution, advancements in experimental and computational methods have made cryo-EM promising for investigating intracellular organization and heterogeneous molecular states. A primary challenge for these alternative applications is the development of techniques for cryo-EM data analysis, which are very computationally demanding. To this end, it is advantageous to leverage advanced scientific computing frameworks for statistical analysis. One such framework is JAX, an emerging array-oriented Python numerical computing package for automatic differentiation and vectorization with a growing ecosystem for statistical inference and machine learning. We have developed cryoJAX, a cryo-EM image simulation library for building computational data analysis applications in JAX. CryoJAX is a flexible modeling language for cryo-EM image formation and therefore can support a wide range of data analysis downstream. By integrating with the JAX ecosystem, cryoJAX enables the development and deployment of algorithms for the growing breadth of scientific applications for cryo-EM.

Show Abstract
October 24, 2025

Scalable inference of functional neural connectivity at submillisecond timescales

A. Medvedeva, E. Balzani, A. Williams, Stephen L Keeley

The Poisson Generalized Linear Model (GLM) is a foundational tool for analyzing neural spike train data. However, standard implementations rely on discretizing spike times into binned count data, limiting temporal resolution and scalability. Here, we develop Monte Carlo (MC) methods and polynomial approximations (PA) to the continuous-time analog of these models, and show them to be advantageous over their discrete-time counterparts. Further, we propose using a set of exponentially scaled Laguerre polynomials as an orthogonal temporal basis, which improves filter identification and yields closed-form integral solutions under the polynomial approximation. Applied to both synthetic and real spike-time data from rodent hippocampus, our methods demonstrate superior accuracy and scalability compared to traditional binned GLMs, enabling functional connectivity inference in large-scale neural recordings that are temporally precise on the order of synaptic dynamical timescales and in agreement with known anatomical properties of hippocampal subregions. We provide open-source implementations of both MC and PA estimators, optimized for GPU acceleration, to facilitate adoption in the neuroscience community.

Show Abstract
October 23, 2025

Examining Age-Bias and Stereotypes of Aging in LLMs

Sherwin Dewan , Ismail Shaikh, A. Sahoo

Large Language Models (LLMs) are increasingly being used across applications, ranging from content generation to decision-making, raising concerns about biases embedded in them. While biases related to gender, race, and culture have been studied extensively, understanding age-bias and stereotypes of aging in LLMs remain underexplored. This study analyzes LLM-generated responses to prompts related to aging, revealing stereotypical biases about aging pertaining to technology proficiency, cognitive and physical decline, and job roles. We noted that even responses without explicit age bias also had mentions of ageist stereotypes. We discuss considerations for involving older adults’ perspectives through human-in-the-loop methodologies yet exercising caution about aspects like internalized ageism.

Show Abstract
October 22, 2025

Neutral Gas Phase Distribution from H I Morphology: Phase Separation with Scattering Spectra and Variational Autoencoders

Minjie Lei , S. E. Clark, R. Morel, Et al.

Unraveling the multiphase structure of the diffuse interstellar medium as traced by neutral hydrogen (H i) is essential to understanding the lifecycle of the Milky Way. However, H i phase separation is a challenging and underconstrained problem. The neutral gas phase distribution is often inferred from the spectral line structure of H i emission. In this work, we develop a data-driven phase-separation method that extracts H i phase structure solely from the spatial morphology of H i emission intensity structures. We combine scattering spectra (SS) statistics with a Gaussian-mixture variational autoencoder model to (1) derive an interpretable statistical model of different H i phases from their multiscale morphological structures, and (2) we use this model to decompose the 2D channel maps of GALFA-H i emission in diffuse high-latitude (|b|>30) regions over narrow velocity channels (Δv=3 km/s) into cold neutral medium (CNM), warm neutral medium (WNM), and noise components. We integrate our CNM map over velocity channels to compare it to an existing map produced by a spectrum-based method. We find that the two maps are highly correlated, but ours recovers more spatially coherent structures at small scales. Our work illustrates and quantifies a clear physical connection between the H i morphology and H i phase structure, and it unlocks a new avenue for improving future phase-separation techniques by making use of both H i spectral and spatial information to decompose H i in 3D position–position–velocity space. These results are consistent with a physical picture where processes that drive H i phase transitions also shape the morphology of H i gas, imprinting a sparse, filamentary CNM that forms out of a diffuse, extended WNM.

Show Abstract

Disrupted developmental signaling induces novel transcriptional states

Aleena Patel, Vanessa Gonzalez, S. Shvartsman, M. Avdeeva

Signaling pathways induce stereotyped transcriptional changes as stem cells progress into mature cell types during embryogenesis. Signaling perturbations are necessary to discover which genes are responsive or insensitive to pathway activity. However, gene regulation is additionally dependent on cell state-specific factors like chromatin modifications or transcription factor binding. Thus, transcriptional profiles need to be assayed in single cells to identify potentially multiple, distinct perturbation responses among heterogeneous cell states in an embryo. In perturbation studies, comparing heterogeneous transcriptional states among experimental conditions often requires samples to be collected over multiple independent experiments, which can introduce confounding batch effects. We present Design-Aware Integration of Single Cell ExpEriments (DAISEE), a new algorithm that models perturbation responses in single-cell datasets collected according to complex experimental designs. We demonstrate that DAISEE improves upon a previously available integrative nonnegative matrix factorization framework, more efficiently separating perturbation responses from confounding variation. We use DAISEE to integrate newly collected single-cell RNA sequencing datasets from 5-h-old zebrafish embryos expressing optimized photoswitchable MEK (psMEK), which globally activates the extracellular signal-regulated kinase (ERK), a signaling molecule involved in many cell specification events. psMEK drives some cells that are normally not exposed to ERK signals toward other wild type states and induces novel states expressing early-acting endothelial genes. Overactive signaling is therefore capable of producing unexpected gene expression states in developing embryos.

Show Abstract

Universal Spectral Tokenization via Self-Supervised Panchromatic Representation Learning

Jeff Shen, F. Lanusse, Liam Holden Parker, L. Sarra, A. Bietti, R. Morel, Et al.

Sequential scientific data span many resolutions and domains, and unifying them into a common representation is a key step toward developing foundation models for the sciences. Astronomical spectra exemplify this challenge: massive surveys have collected millions of spectra across a wide range of wavelengths and resolutions, yet analyses remain fragmented across spectral domains (e.g., optical vs. infrared) and object types (e.g., stars vs. galaxies), limiting the ability to pool information across datasets. We present a deep learning model that jointly learns from heterogeneous spectra in a self-supervised manner. Our universal spectral tokenizer processes spectra from a variety of object types and resolutions directly on their native wavelength grids, producing intrinsically aligned, homogeneous, and physically meaningful representations that can be efficiently adapted to achieve competitive performance across a range of downstream tasks. For the first time, we demonstrate that a single model can unify spectral data across resolutions and domains, suggesting that our model can serve as a powerful building block for foundation models in astronomy—and potentially extend to other scientific domains with heterogeneous sequential data, such as climate and healthcare.

Show Abstract

The Helmholtz Dirichlet and Neumann problems on piecewise smooth open curves

Johan Helsing, S. Jiang

A numerical scheme is presented for solving the Helmholtz equation with Dirichlet or Neumann boundary conditions on piecewise smooth open curves, where the curves may have corners and multiple junctions. Existing integral equation methods for smooth open curves rely on analyzing the exact singularities of the density at endpoints for associated integral operators, explicitly extracting these singularities from the densities in the formulation, and using global quadrature to discretize the boundary integral equation. Extending these methods to handle curves with corners and multiple junctions is challenging because the singularity analysis becomes much more complex, and constructing high-order quadrature for discretizing layer potentials with singular and hypersingular kernels and singular densities is nontrivial. The proposed scheme is built upon the following two observations. First, the single-layer potential operator and the normal derivative of the double-layer potential operator serve as effective preconditioners for each other locally. Second, the recursively compressed inverse preconditioning (RCIP) method can be extended to address “implicit” second-kind integral equations. The scheme is high-order, adaptive, and capable of handling corners and multiple junctions without prior knowledge of the density singularity. It is also compatible with fast algorithms, such as the fast multipole method. The performance of the scheme is illustrated with several numerical examples.

Show Abstract

Cell size reduction scales spindle elongation but not chromosome segregation in C. elegans

Chukwuebuka William Okafornta, R. Farhadifar, M. Shelley, D. Needleman, et al.

How embryos adapt their internal cellular machinery to reductions in cell size during development remains a fundamental question in cell biology1–11. Here, we use high-resolution lattice light-sheet fluorescence microscopy and automated image analysis to quantify lineage-resolved mitotic spindle and chromosome segregation dynamics from the 2-to 64-cell stages in Caenorhabditis elegans embryos. While spindle length scales with cell size across both wild-type and size-perturbed embryos, chromosome segregation dynamics remain largely invariant, suggesting that distinct mechanisms govern these mitotic processes. Combining femtosecond laser ablation12,13 with large-scale electron tomography14, we find that central spindle microtubules mediate chromosome segregation dynamics and remain uncoupled from cell size across all stages of early development. In contrast, spindle elongation is driven by cortically anchored motor proteins and astral microtubules, rendering it sensitive to cell size12,13,15–17. Incorporating these experimental results into an extended stoichiometric model for both the spindle and chromosomes, we find that allowing only cell size and microtubule catastrophe rates to vary reproduces elongation dynamics across development. The same model also accounts for centrosome separation and pronuclear positioning in the one-cell C. elegans embryo18, spindle-length scaling across nematode species spanning ~100 million years of divergence17, and spindle rotation in human cells19. Thus, a unified stoichiometric framework provides a predictive, mechanistic account of spindle and nuclear dynamics across scales and species.

Show Abstract
October 14, 2025

Cell size reduction drives spindle scaling but not chromosome segregation in C. elegans

Chukwuebuka William Okafornta, R. Farhadifar, M. Shelley, D. Needleman, et al.

How embryos adapt their internal cellular machinery to reductions in cell size during development remains a fundamental question in cell biology1–11. Here, we use high-resolution lattice light-sheet fluorescence microscopy and automated image analysis to quantify lineage-resolved mitotic spindle and chromosome segregation dynamics from the 2- to 64-cell stages in Caenorhabditis elegans embryos. While spindle length scales with cell size across both wild-type and size-perturbed embryos, chromosome segregation dynamics remain largely invariant, suggesting that distinct mechanisms govern these mitotic processes. Combining femtosecond laser ablation12,13 with large-scale electron tomography14, we find that central spindle microtubules mediate chromosome segregation dynamics and remain uncoupled from cell size across all stages of early development. In contrast, spindle elongation is driven by cortically anchored motor proteins and astral microtubules, rendering it sensitive to cell size12,13,15–17. Incorporating these experimental results into an extended stoichiometric model for both the spindle and chromosomes, we find that allowing only cell size and microtubule catastrophe rates to vary reproduces elongation dynamics across development. The same model also accounts for centrosome separation and pronuclear positioning in the one-cell C. elegans embryo18, spindle-length scaling across nematode species spanning ~100 million years of divergence17, and spindle rotation in human cells19. Thus, a unified stoichiometric framework provides a predictive, mechanistic account of spindle and nuclear dynamics across scales and species.

Show Abstract
October 14, 2025

Unconditional CNN denoisers contain sparse semantic representation of images

Generative diffusion models learn probability densities over diverse image datasets by estimating the score with a neural network trained to remove noise. Despite their remarkable success in generating high-quality images, the internal mechanisms of the underlying score networks are not well understood. Here, we examine the image representation that arises from score estimation in a {fully-convolutional unconditional UNet}. We show that the middle block of the UNet decomposes individual images into sparse subsets of active channels, and that the vector of spatial averages of these channels can provide a nonlinear representation of the underlying clean images. Euclidean distances in this representation space are semantically meaningful, even though no conditioning information is provided during training. We develop a novel algorithm for stochastic reconstruction of images conditioned on this representation: The synthesis using the unconditional model is "self-guided" by the representation extracted from that very same model. For a given representation, the common patterns in the set of reconstructed samples reveal the features captured in the middle block of the UNet. Together, these results show, for the first time, that a measure of semantic similarity emerges, unsupervised, solely from the denoising objective.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates