2697 Publications

Comparing cryo-EM methods and molecular dynamics simulation to investigate heterogeneity in ligand-bound TRPV1

M. Astore, David Silva-Sánchez, R. Blackwell, P. Cossio, S. Hanson

Cryogenic electron microscopy (cryo-EM) has emerged as a powerful method for resolving the structure of biological macromolecules. Recently, several computational methods have been developed to study the heterogeneity of molecules in single-particle cryo-EM. In this study, we analyze a publicly available dataset of TRPV1 using five such methods: 3DFlex, 3DVA, cryoDRGN, ManifoldEM, and Bayesian ensemble reweighting. We find significant heterogeneity, but each method produces different results, with some detecting only compositional or conformational heterogeneity. To compare these diverse results, we develop AnaVox to quantitatively determine agreement between heterogeneity methods. Furthermore, applying Bayesian ensemble reweighting combined with molecular dynamics simulations supports the presence of these rarer states within the sample. This study shows that although current methods reveal the presence of heterogeneity, their stochasticity and potential bias present challenges for their routine use. However, with future development, these tools will enable the use of cryo-EM data for quantitative biophysical investigations.

Show Abstract

Improving Cryo-EM Optimization Robustness with an Optimal Transport Loss Function for Noisy Images

Geoffrey Woollard , David Herreros, P. Cossio, et al.

Many tasks in single-particle cryo-electron microscopy (cryo-EM), such as 2D/3D classification and homo/heterogeneous reconstruction, require optimizing model parameters to minimize the discrepancy between observed data and a forward model. The standard Mean Squared Error (MSE) loss function is computationally efficient but suffers from a non-convex rugged loss landscape, particularly for high-resolution heterogeneity inference. In this work, we investigate the practical utility of Sliced Wasserstein (SW) distances. We implement exact W2 estimators (inverse-CDF and greedy matching) of projections alongside a computationally efficient proxy based on the L2 norm of CDFs, a formulation akin to the sliced Cramér–von Mises distance. We establish the latter as a robust, fully differentiable workhorse for the cryo-EM forward model. We evaluate its performance against the MSE in joint inference tasks recovering pose, CTF parameters, and conformational heterogeneity. Our results demonstrate that SW significantly broadens the basin of attraction, enabling robust gradient-based optimization from distant initializations where MSE fails. Using a helical spiral toy model, we highlight how SW losses are sensitive to per-particle contrast, where background noise level miscalibration can induce geometric bias in the inferred structure. We show that this bias is manageable through a joint optimization strategy that treats background contrast as a learnable parameter. Finally, we validate the approach on a synthetic dataset using the Zernike3D framework, showing that the SW loss works and yields an accurate landscape representations, comparable with MSE. These findings establish SW as a powerful tool for navigating the rugged landscapes of cryo-EM forward model parameters

Show Abstract
December 27, 2025

Age-related nigral downregulation of the Parkinson’s risk factor FAM49B primes human microglia for inflammaging

Jacqueline Martin, C. Park, O. Troyanskaya, et al.

Parkinson’s Disease (PD) is characterized by the loss of dopaminergic neurons in the substantia nigra pars compacta (SNpc), which is associated with changes in microglia function. While age remains the biggest risk factor, the underlying molecular cause of PD onset and its concurrent neuroinflammation are not well understood. Many identified PD risk genes have been directly linked to dopamine neuron impairment, while others are linked to immune cell function. In this study, we found that the PD risk gene FAM49B is critically expressed in microglia of the human SNpc and is downregulated with age and PD. We utilized human and murine microglia cells to demonstrate the role of FAM49B in regulating fundamental microglial functions such as cytoskeletal maintenance, migration, surface adherence, energy homeostasis, autophagy, and, importantly, inflammatory response. Downregulation of microglial FAM49B, as observed in the SNpc of aging individuals, led to significant alterations in these cellular functions, which are associated with increased microglial activation. Thus, our study highlights novel cell-type-specific roles of FAM49B and provides a potential mechanism for susceptibility to neuroinflammation, and reactive gliosis observed in both PD and normal aging.

Show Abstract
December 19, 2025

Disentangled representations via score-based variational autoencoders

Benjamin S. H. Lyo, C. Savin, E. P. Simoncelli

We present the Score-based Autoencoder for Multiscale Inference (SAMI), a method for unsupervised representation learning that combines the theoretical frameworks of diffusion models and VAEs. By unifying their respective evidence lower bounds, SAMI formulates a principled objective that learns representations through score-based guidance of the underlying diffusion process. The resulting representations automatically capture meaningful structure in the data: it recovers ground truth generative factors in our synthetic dataset, learns factorized, semantic latent dimensions from complex natural images, and encodes video sequences into latent trajectories that are straighter than those of alternative encoders, despite training exclusively on static images. Furthermore, SAMI can extract useful representations from pre-trained diffusion models with minimal additional training. Finally, the explicitly probabilistic formulation provides new ways to identify semantically meaningful axes in the absence of supervised labels, and its mathematical exactness allows us to make formal statements about the nature of the learned representation. Overall, these results indicate that implicit structural information in diffusion models can be made explicit and interpretable through synergistic combination with a variational autoencoder.

Show Abstract
December 18, 2025

Stabilizing the singularity swap quadrature for near-singular line integrals

David Krantz, A. Barnett, Anna-Karin Tornberg

Singularity swap quadrature (SSQ) is an effective method for the evaluation at nearby targets of potentials due to densities on curves in three dimensions. While highly accurate in most settings, it is known to suffer from catastrophic cancellation when the kernel exhibits both near-vanishing numerators and strong singularities, as arises with scalar double layer potentials or tensorial kernels in Stokes flow or linear elasticity. This precision loss turns out to be tied to the interpolation basis, namely monomial (for open curves) or Fourier (for closed curves). We introduce a simple yet powerful remedy: target-specific translated monomial and Fourier bases that explicitly incorporate the near-vanishing behavior of the kernel numerator. We combine this with a stable evaluation of the constant term which now dominates the integral, significantly reducing cancellation. We show that our approach achieves close to machine precision for prototype integrals, and up to ten orders of magnitude lower error than standard SSQ at extremely close evaluation distances, without significant additional computational cost.

Show Abstract

From labels to latents: revealing state-dependent hippocampal computations with Jump Latent Variable Model

S. Zheng, Ipshita Zutshi, Roman Huszár, Yiyao Zhang, Mursel Karadas, György Buzsáki, A. Williams

Neural activity is usually interpreted by imposing external labels (e.g., stimuli or position during locomotion) and decoding within that space (e.g. replay). While powerful, such supervision can mask structure in the data that do not correspond to the label. Unsupervised methods, in turn, often assume smooth latent dynamics and miss genuine discontinuities. We introduce a conceptually simple, computationally efficient latent variable model that infers both (i) the latent variables organizing population activity and (ii) whether their dynamics are continuous or fragmented in time. Fitting reduces to an expectation-maximization (EM) procedure that alternates two operations familiar to systems neuroscience—tuning-curve estimation and label decoding—without requiring external labels. Applied to rodent hippocampal spike recordings, the model reveals distinct population patterns at the same physical position that supervised spatial decoding fails to detect. While learned latents exhibit place-field-like tuning, their reactivation patterns are better distinguished by behavioral states. The model further identifies a continuity-fragmentation axis that characterizes population activities across sleep-wake brain states that is modulated by cholinergic inputs. By not relying on externally imposed spatial labels, our approach exposes structure that supervised approaches obscure and provides a powerful tool for datasets lacking behavioral tracking.

Show Abstract

Facilitating analysis of open neurophysiology data on the DANDI Archive using large language model tools

The DANDI Archive is a key resource for sharing open neurophysiology data, hosting over 400 datasets in the Neurodata Without Borders (NWB) format. While these datasets hold tremendous potential for reanalysis and discovery, many researchers face barriers to reuse, including unfamiliarity with access methods and difficulty identifying relevant content. Here we introduce an AI-powered, agentic chat assistant and a notebook generation pipeline. The chat assistant serves as an interactive tool for exploring DANDI datasets. It leverages large language models (LLMs) and integrates with agentic tools to guide users through data access, visualization, and preliminary analysis. The notebook generator analyzes dataset structure with minimal human input, executing inspection scripts and generating visualizations. It then produces an instructional Python notebook tailored to the dataset. We applied this system to 12 recent datasets. Review by neurophysiology data specialists found the generated notebooks to be generally accurate and well-structured, with most notebooks rated as “very helpful.” This work demonstrates how AI can support FAIR principles by leveraging data standards and lowering barriers to data reuse and engagement.

Show Abstract

A Model-Guided Neural Network Method for the Inverse Scattering Problem

Olivia Tsang, O. Melia, Vasileios Charisopoulos, Jeremy Hoskins, Jeremy Hoskins, Rebecca Willett

Inverse medium scattering is an ill-posed, nonlinear wave-based imaging problem arising in medical imaging, remote sensing, and non-destructive testing. Machine learning (ML) methods offer increased inference speed and flexibility in capturing prior knowledge of imaging targets relative to classical optimization-based approaches; however, they perform poorly in regimes where the scattering behavior is highly nonlinear. A key limitation is that ML methods struggle to incorporate the physics governing the scattering process, which are typically inferred implicitly from the training data or loosely enforced via architectural design. In this paper, we present a method that endows a machine learning framework with explicit knowledge of problem physics, in the form of a differentiable solver representing the forward model. The proposed method progressively refines reconstructions of the scattering potential using measurements at increasing wave frequencies, following a classical strategy to stabilize recovery. Empirically, we find that our method provides high-quality reconstructions at a fraction of the computational or sampling costs of competing approaches.

Show Abstract

Protein Design with Agent Rosetta: A Case Study for Specialized Scientific Agents

Jacopo Teneggi, Tanya Marwah, A. Bietti, P. Douglas Renfrew, Vikram Mulligan, S. Golkar

Large language models (LLMs) are increasingly capable of emulating reasoning and using tools, creating opportunities for autonomous agents that execute complex scientific tasks. Protein design provides a natural case study: existing deep learning models achieve strong results, but they are typically restricted to canonical amino acids and narrow objectives, leaving space for a generalist tool for broad design pipelines. We introduce Agent Rosetta, an LLM agent built on top of the Rosetta suite---the leading physics-based software for heteropolymer design, capable of modeling non-canonical building blocks and geometries. Agent Rosetta is a single-agent, multi-turn framework that iteratively refines heteropolymers to achieve the goals of a user-defined task brief, combining the biophysical knowledge of modern LLMs with the accuracy of Rosetta's physics-based methods. In evaluations, Agent Rosetta achieves performance comparable to specialized deep learning models, especially when combined with inference-time techniques such as best-of-n sampling. Interestingly, we find that prompt engineering alone is insufficient for reliably producing RosettaScripts actions. This underscores the need for building a comprehensive environment that, for example, simplifies the most challenging aspects of RosettaScripts syntax. These results demonstrate that combining frontier LLMs with established domain-specific scientific tools can yield flexible agentic frameworks that not only lower barriers to use but also achieve performance competitive with specialized deep learning models.

Show Abstract

EmbryoProfiler: A Visual Clinical Decision Support System for IVF

Johannes Knittel , Simon Warchol, D. Needleman, et al.

In-vitro fertilization (IVF) has become standard practice to address infertility, which affects more than one in ten couples in the US. However, current protocols yield relatively low success rates of about 20% per treatment cycle. A critical but complex and time-consuming step is the grading and selection of embryos for implantation. Although incubators with time-lapse microscopy have enabled computational analysis of embryo development, existing automated approaches either require extensive manual annotations or use opaque deep learning models that are hard for clinicians to validate and trust. We present EmbryoProfiler, a visual analytics system collaboratively developed with embryologists, biologists, and machine learning researchers to support clinicians in visually assessing embryo viability from time-lapse microscopy imagery. Our system incorporates a deep learning pipeline that automatically annotates microscopy images and extracts clinically interpretable features relevant for embryo grading. Our contributions include: (1) a semi-automatic, visualization-based workflow that guides clinicians through fertilization assessment, developmental timing evaluation, morphological inspection, and comparative analysis of embryos; (2) innovative interactive visualizations, such as cell-shape plots, designed to facilitate efficient analysis of morphological and developmental characteristics; and (3) an integrated, explainable machine learning classifier offering transparent, clinically-informed embryo viability scoring to predict live birth outcomes. Quantitative evaluation of our classifier and qualitative case studies conducted with practitioners demonstrate that EmbryoProfiler enables clinicians to make better-informed embryo selection decisions, potentially leading to improved clinical outcomes in IVF treatments.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates