2795 Publications

Comparing cryo-EM methods and molecular dynamics simulation to investigate heterogeneity in ligand-bound TRPV1

M. Astore, David Silva-Sánchez, R. Blackwell, P. Cossio, S. Hanson

Cryogenic electron microscopy (cryo-EM) has emerged as a powerful method for resolving the structure of biological macromolecules. Recently, several computational methods have been developed to study the heterogeneity of molecules in single-particle cryo-EM. In this study, we analyze a publicly available dataset of TRPV1 using five such methods: 3DFlex, 3DVA, cryoDRGN, ManifoldEM, and Bayesian ensemble reweighting. We find significant heterogeneity, but each method produces different results, with some detecting only compositional or conformational heterogeneity. To compare these diverse results, we develop AnaVox to quantitatively determine agreement between heterogeneity methods. Furthermore, applying Bayesian ensemble reweighting combined with molecular dynamics simulations supports the presence of these rarer states within the sample. This study shows that although current methods reveal the presence of heterogeneity, their stochasticity and potential bias present challenges for their routine use. However, with future development, these tools will enable the use of cryo-EM data for quantitative biophysical investigations.

Show Abstract

Comparing cryo-EM methods and molecular dynamics simulation to investigate heterogeneity in ligand-bound TRPV1

M. Astore, David Silva-Sánchez, R. Blackwell, P. Cossio, S. Hanson

Cryogenic electron microscopy (cryo-EM) has emerged as a powerful method for resolving the structure of biological macromolecules. Recently, several computational methods have been developed to study the heterogeneity of molecules in single-particle cryo-EM. In this study, we analyze a publicly available dataset of TRPV1 using five such methods: 3DFlex, 3DVA, cryoDRGN, ManifoldEM, and Bayesian ensemble reweighting. We find significant heterogeneity, but each method produces different results, with some detecting only compositional or conformational heterogeneity. To compare these diverse results, we develop AnaVox to quantitatively determine agreement between heterogeneity methods. Furthermore, applying Bayesian ensemble reweighting combined with molecular dynamics simulations supports the presence of these rarer states within the sample. This study shows that although current methods reveal the presence of heterogeneity, their stochasticity and potential bias present challenges for their routine use. However, with future development, these tools will enable the use of cryo-EM data for quantitative biophysical investigations.

Show Abstract

Improving Cryo-EM Optimization Robustness with an Optimal Transport Loss Function for Noisy Images

Geoffrey Woollard , David Herreros, P. Cossio, et al.

Many tasks in single-particle cryo-electron microscopy (cryo-EM), such as 2D/3D classification and homo/heterogeneous reconstruction, require optimizing model parameters to minimize the discrepancy between observed data and a forward model. The standard Mean Squared Error (MSE) loss function is computationally efficient but suffers from a non-convex rugged loss landscape, particularly for high-resolution heterogeneity inference. In this work, we investigate the practical utility of Sliced Wasserstein (SW) distances. We implement exact W2 estimators (inverse-CDF and greedy matching) of projections alongside a computationally efficient proxy based on the L2 norm of CDFs, a formulation akin to the sliced Cramér–von Mises distance. We establish the latter as a robust, fully differentiable workhorse for the cryo-EM forward model. We evaluate its performance against the MSE in joint inference tasks recovering pose, CTF parameters, and conformational heterogeneity. Our results demonstrate that SW significantly broadens the basin of attraction, enabling robust gradient-based optimization from distant initializations where MSE fails. Using a helical spiral toy model, we highlight how SW losses are sensitive to per-particle contrast, where background noise level miscalibration can induce geometric bias in the inferred structure. We show that this bias is manageable through a joint optimization strategy that treats background contrast as a learnable parameter. Finally, we validate the approach on a synthetic dataset using the Zernike3D framework, showing that the SW loss works and yields an accurate landscape representations, comparable with MSE. These findings establish SW as a powerful tool for navigating the rugged landscapes of cryo-EM forward model parameters

Show Abstract
December 27, 2025

Improving Cryo-EM Optimization Robustness with an Optimal Transport Loss Function for Noisy Images

Geoffrey Woollard , David Herreros, P. Cossio, et al.

Many tasks in single-particle cryo-electron microscopy (cryo-EM), such as 2D/3D classification and homo/heterogeneous reconstruction, require optimizing model parameters to minimize the discrepancy between observed data and a forward model. The standard Mean Squared Error (MSE) loss function is computationally efficient but suffers from a non-convex rugged loss landscape, particularly for high-resolution heterogeneity inference. In this work, we investigate the practical utility of Sliced Wasserstein (SW) distances. We implement exact W2 estimators (inverse-CDF and greedy matching) of projections alongside a computationally efficient proxy based on the L2 norm of CDFs, a formulation akin to the sliced Cramér–von Mises distance. We establish the latter as a robust, fully differentiable workhorse for the cryo-EM forward model. We evaluate its performance against the MSE in joint inference tasks recovering pose, CTF parameters, and conformational heterogeneity. Our results demonstrate that SW significantly broadens the basin of attraction, enabling robust gradient-based optimization from distant initializations where MSE fails. Using a helical spiral toy model, we highlight how SW losses are sensitive to per-particle contrast, where background noise level miscalibration can induce geometric bias in the inferred structure. We show that this bias is manageable through a joint optimization strategy that treats background contrast as a learnable parameter. Finally, we validate the approach on a synthetic dataset using the Zernike3D framework, showing that the SW loss works and yields an accurate landscape representations, comparable with MSE. These findings establish SW as a powerful tool for navigating the rugged landscapes of cryo-EM forward model parameters

Show Abstract
December 27, 2025

Age-related nigral downregulation of the Parkinson’s risk factor FAM49B primes human microglia for inflammaging

Jacqueline Martin, C. Park, O. Troyanskaya, et al.

Parkinson’s Disease (PD) is characterized by the loss of dopaminergic neurons in the substantia nigra pars compacta (SNpc), which is associated with changes in microglia function. While age remains the biggest risk factor, the underlying molecular cause of PD onset and its concurrent neuroinflammation are not well understood. Many identified PD risk genes have been directly linked to dopamine neuron impairment, while others are linked to immune cell function. In this study, we found that the PD risk gene FAM49B is critically expressed in microglia of the human SNpc and is downregulated with age and PD. We utilized human and murine microglia cells to demonstrate the role of FAM49B in regulating fundamental microglial functions such as cytoskeletal maintenance, migration, surface adherence, energy homeostasis, autophagy, and, importantly, inflammatory response. Downregulation of microglial FAM49B, as observed in the SNpc of aging individuals, led to significant alterations in these cellular functions, which are associated with increased microglial activation. Thus, our study highlights novel cell-type-specific roles of FAM49B and provides a potential mechanism for susceptibility to neuroinflammation, and reactive gliosis observed in both PD and normal aging.

Show Abstract
December 19, 2025

Condensation dynamics of sticky and anchored flexible biopolymers

Cells regulate gene expression in part by forming DNA-protein condensates in the nucleus. While existing theories describe the equilibrium size and stability of such condensates, their dynamics remain less understood. Here, we use coarse-grained 3D Brownian-dynamics simulations to study how long, end-anchored biopolymers condense over time due to transient crosslinking. By tracking how clusters nucleate, merge, and disappear, we identify two dominant dynamical pathways, ripening and merging, that govern the progression from an uncompacted chain to a single condensate. We show how microscopic kinetic parameters, protein density, and mechanical constraints shape these pathways. Using insights from the simulations, we construct a minimal mechanistic free-energy model that captures the observed scaling behavior. Together, these results clarify the dynamical determinants of DNA and chromatin reorganization on timescales relevant to gene regulation.

Show Abstract
December 19, 2025

Disentangled representations via score-based variational autoencoders

Benjamin S. H. Lyo, C. Savin, E. P. Simoncelli

We present the Score-based Autoencoder for Multiscale Inference (SAMI), a method for unsupervised representation learning that combines the theoretical frameworks of diffusion models and VAEs. By unifying their respective evidence lower bounds, SAMI formulates a principled objective that learns representations through score-based guidance of the underlying diffusion process. The resulting representations automatically capture meaningful structure in the data: it recovers ground truth generative factors in our synthetic dataset, learns factorized, semantic latent dimensions from complex natural images, and encodes video sequences into latent trajectories that are straighter than those of alternative encoders, despite training exclusively on static images. Furthermore, SAMI can extract useful representations from pre-trained diffusion models with minimal additional training. Finally, the explicitly probabilistic formulation provides new ways to identify semantically meaningful axes in the absence of supervised labels, and its mathematical exactness allows us to make formal statements about the nature of the learned representation. Overall, these results indicate that implicit structural information in diffusion models can be made explicit and interpretable through synergistic combination with a variational autoencoder.

Show Abstract
December 18, 2025

Stabilizing the singularity swap quadrature for near-singular line integrals

David Krantz, A. Barnett, Anna-Karin Tornberg

Singularity swap quadrature (SSQ) is an effective method for the evaluation at nearby targets of potentials due to densities on curves in three dimensions. While highly accurate in most settings, it is known to suffer from catastrophic cancellation when the kernel exhibits both near-vanishing numerators and strong singularities, as arises with scalar double layer potentials or tensorial kernels in Stokes flow or linear elasticity. This precision loss turns out to be tied to the interpolation basis, namely monomial (for open curves) or Fourier (for closed curves). We introduce a simple yet powerful remedy: target-specific translated monomial and Fourier bases that explicitly incorporate the near-vanishing behavior of the kernel numerator. We combine this with a stable evaluation of the constant term which now dominates the integral, significantly reducing cancellation. We show that our approach achieves close to machine precision for prototype integrals, and up to ten orders of magnitude lower error than standard SSQ at extremely close evaluation distances, without significant additional computational cost.

Show Abstract

From labels to latents: revealing state-dependent hippocampal computations with Jump Latent Variable Model

S. Zheng, Ipshita Zutshi, Roman Huszár, Yiyao Zhang, Mursel Karadas, György Buzsáki, A. Williams

Neural activity is usually interpreted by imposing external labels (e.g., stimuli or position during locomotion) and decoding within that space (e.g. replay). While powerful, such supervision can mask structure in the data that do not correspond to the label. Unsupervised methods, in turn, often assume smooth latent dynamics and miss genuine discontinuities. We introduce a conceptually simple, computationally efficient latent variable model that infers both (i) the latent variables organizing population activity and (ii) whether their dynamics are continuous or fragmented in time. Fitting reduces to an expectation-maximization (EM) procedure that alternates two operations familiar to systems neuroscience—tuning-curve estimation and label decoding—without requiring external labels. Applied to rodent hippocampal spike recordings, the model reveals distinct population patterns at the same physical position that supervised spatial decoding fails to detect. While learned latents exhibit place-field-like tuning, their reactivation patterns are better distinguished by behavioral states. The model further identifies a continuity-fragmentation axis that characterizes population activities across sleep-wake brain states that is modulated by cholinergic inputs. By not relying on externally imposed spatial labels, our approach exposes structure that supervised approaches obscure and provides a powerful tool for datasets lacking behavioral tracking.

Show Abstract

Facilitating analysis of open neurophysiology data on the DANDI Archive using large language model tools

The DANDI Archive is a key resource for sharing open neurophysiology data, hosting over 400 datasets in the Neurodata Without Borders (NWB) format. While these datasets hold tremendous potential for reanalysis and discovery, many researchers face barriers to reuse, including unfamiliarity with access methods and difficulty identifying relevant content. Here we introduce an AI-powered, agentic chat assistant and a notebook generation pipeline. The chat assistant serves as an interactive tool for exploring DANDI datasets. It leverages large language models (LLMs) and integrates with agentic tools to guide users through data access, visualization, and preliminary analysis. The notebook generator analyzes dataset structure with minimal human input, executing inspection scripts and generating visualizations. It then produces an instructional Python notebook tailored to the dataset. We applied this system to 12 recent datasets. Review by neurophysiology data specialists found the generated notebooks to be generally accurate and well-structured, with most notebooks rated as “very helpful.” This work demonstrates how AI can support FAIR principles by leveraging data standards and lowering barriers to data reuse and engagement.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates