689 Publications

Methylation Data Analysis and Interpretation

Yuehua Zhu, W. Mao , et al.

DNA methylation, a covalent modification, fundamentally shapes mammalian gene regulation and cellular identity. This review examines methylation's biochemical underpinnings, genomic distribution patterns, and analytical approaches. We highlight three distinctive aspects that separate methylation from other epigenetic marks: its remarkable stability as a silencing mechanism, its capacity to maintain distinct states independently of DNA sequence, and its effectiveness as a quantitative trait linking genotype to disease risk. We also explore the phenomenon of methylation clocks and their biological significance. The review addresses technical considerations across major assay types—both array-based technologies and sequencing approaches—with emphasis on data normalization, quality control, cell proportion inference, and the specialized statistical models required for next-generation sequencing analysis.

Show Abstract

Nuclear biophysics: Spatial coordination of transcriptional dynamics?

Tae Yeon Yoo , Bernardo Gouveia, D. Needleman

A great deal is known about biochemical aspects of transcription, but we still lack an understanding of how transcription is causally regulated in space and time. A major unanswered question is the extent to which transcription at different locations in the nucleus are independent from each other or, instead, are spatially coordinated. We propose two classes of models of coordination: 1) the shared environment model, in which neighboring loci exhibit coordinated transcriptional dynamics due to sharing the same local biochemical environment; 2) the mechanical crosstalk model, in which forces propagate from one actively transcribing locus to affect transcription of another. Determining the prevalence of the spatial coordination of transcription, and the underlying mechanisms when it occurs, is an exciting challenge in nuclear biophysics.

Show Abstract

Active Liquid Crystal Theory Explains the Collective Organization of Microtubules in Human Mitotic Spindles

Colm P. Kelleher, S. Maddu, Mustafa Basaran, Thomas Müller-Reichert, M. Shelley, D. Needleman

How thousands of microtubules and molecular motors self-organize into spindles remains poorly understood. By combining static, nanometer-resolution, large-scale electron tomography reconstructions and dynamic, optical-resolution, polarized light microscopy, we test an active liquid crystal continuum model of mitotic spindles in human tissue culture cells. The predictions of this coarse-grained theory quantitatively agree with the experimentally measured spindle morphology and fluctuation spectra. These findings argue that local interactions and polymerization produce collective alignment, diffusive-like motion, and polar transport which govern the behaviors of the spindle's microtubule network, and provide a means to measure the spindle's material properties. This work demonstrates that a coarse-grained theory featuring measurable, physically-interpretable parameters can quantitatively describe the mechanical behavior and self-organization of human mitotic spindles.

Show Abstract
July 29, 2025

The physical consequences of sperm gigantism

The male fruit fly produces ~1.8 mm long sperm, thousands of which can be stored until mating in a ~200 micron sac, the seminal vesicle. While the evolutionary pressures driving such extreme sperm (flagellar) lengths have long been investigated, the physical consequences of their gigantism are unstudied. Through high-resolution three-dimensional reconstructions of in vivo sperm morphologies and rapid live imaging, we discovered that stored sperm are organized into a dense and highly aligned state. The packed flagella exhibit system-wide collective 'material' flows, with persistent and slow-moving topological defects; individual sperm, despite their extraordinary lengths, propagate rapidly through the flagellar material, moving in either direction along material director lines. To understand how these collective behaviors arise from the constituents' nonequilibrium dynamics, we conceptualize the motion of individual sperm as topologically confined to a reptation-like tube formed by its neighbors. Therein, sperm propagate through observed amplitude-constrained and internally driven flagellar bending waves, pushing off counter-propagating neighbors. From this conception, we derive a continuum theory that produces an extensile material stress that can sustain an aligned flagellar material. Experimental perturbations and simulations of active elastic filaments verify our theoretical predictions. Our findings suggest that active stresses in the flagellar material maintain the sperm in an unentangled, hence functional state, in both sexes, and establish giant sperm in their native habitat as a novel and physiologically relevant active matter system

Show Abstract
July 25, 2025

Large protein databases reveal structural complementarity and functional locality

Paweł Szczerbiak, Lukasz M. Szydlowski, D. Renfrew, et al.

Recent breakthroughs in protein structure prediction have led to a surge in high-quality 3D models, highlighting the need for efficient computational solutions. In our work, we examine the structural clusters from the AlphaFold Protein Structure Database (AFDB), a high-quality subset of ESMAtlas, and the Microbiome Immunity Project (MIP). We create a single cohesive low-dimensional representation of the resulting protein space. We show that, while each database occupies distinct regions, they collectively exhibit significant overlap in their functional profiles. High-level biological functions tend to cluster in particular regions, revealing a shared functional landscape despite the diverse sources of data. By creating a representation of protein structure space, localizing functional annotations within this space, and providing an open-access web-server for exploration, this work offers insights for future research concerning protein sequence-structure-function relationships, enabling biological questions to be asked about taxonomic assignments, environmental factors, or functional specificity. This approach is generalizable, thus enabling further discovery beyond findings presented here.

Show Abstract

Representational drift and learning-induced stabilization in the piriform cortex

Guillermo B. Morales, Miguel A. Muñoz, Y. Tu

The brain encodes external stimuli through patterns of neural activity, forming internal representations of the world. Increasing experimental evidence showed that neural representations for a specific stimulus can change over time in a phenomenon called “representational drift” (RD). However, the underlying mechanisms for this widespread phenomenon remain poorly understood. Here, we study RD in the piriform cortex of the olfactory system with a realistic neural network model that incorporates two general mechanisms for synaptic weight dynamics operating at two well-separated timescales: spontaneous multiplicative fluctuations on a scale of days and spike-timing-dependent plasticity (STDP) effects on a scale of seconds. We show that the slow multiplicative fluctuations in synaptic sizes, which lead to a steady-state distribution of synaptic weights consistent with experiments, can induce RD effects that are in quantitative agreement with recent empirical evidence. Furthermore, our model reveals that the fast STDP learning dynamics during presentation of a given odor drives the system toward a low-dimensional representational manifold, which effectively reduces the dimensionality of synaptic weight fluctuations and thus suppresses RD. Specifically, our model explains why representations of already “learned” odors drift slower than unfamiliar ones, as well as the dependence of the drift rate with the frequency of stimulus presentation—both of which align with recent experimental data. The proposed model not only offers a simple explanation for the emergence of RD and its relation to learning in the piriform cortex, but also provides a general theoretical framework for studying representation dynamics in other neural systems.

Show Abstract

Prediction of local convergent shifts in evolutionary rates with phyloConverge

Elysia Saputra , W. Mao , et al.

Convergence analysis can characterize genetic elements underlying morphological adaptations. However, its performance on regulatory elements is limited due to their modular composition of transcription factor motifs, which have rapid turnover and experience different evolutionary pressures.

We introduce phyloConverge, a phylogenetic method that performs scalable, fine-grained local convergence analysis of genomic elements at flexible length scales. Using a benchmarking case of convergent subterranean mammal adaptation, phyloConverge identifies rate-accelerated conserved noncoding elements (CNEs) with high specificity and statistical robustness relative to competing methods. From CNE-level scoring, we detect the convergent regression of entire CNE units and highlight the contrast that subterranean-associated coding region regression is highly specific to ocular functions, whereas regulatory element regression is enriched for accompanying neuronal phenotypes and other developmental processes. From transcription factor motif-level scoring, we dissect elements into subregions with uneven convergence signals and demonstrate the modular adaptation of CNEs with high functional specificity. Finally, we demonstrate phyloConverge’s scalability to perform high-resolution convergence analysis genome-wide.

Show Abstract

The Fruit Fly Auxodrome: a computer vision setup for longitudinal studies of Drosophila development

Changyuan Wang , Denis F Faerberg , S. Shvartsman, Robert A Marmion

Studies in Drosophila have contributed a great deal to our understanding of developmental mechanisms. Indeed, familiar names of critical signaling components, such as Hedgehog and Notch, have their origins in the readily identifiable morphological phenotypes of Drosophila. Most studies that led to the identification of these and many other highly conserved genes were based on the end-point phenotypes, such as the larval cuticle or the adult wing. Additional information can be extracted from longitudinal studies, which can reveal how the phenotypes emerge over time. Here we present the Fruit Fly Auxodrome, an experimental setup that enables monitoring and quantitative analysis of the entirety of development of 96 individually housed Drosophila from hatching to eclosion. The Auxodrome combines an inexpensive live imaging setup and a computer vision pipeline that provides access to a wide range of quantitative information, such as the times of hatching and pupation, as well as dynamic patterns of larval activity. We demonstrate the Auxodrome in action by recapitulating several previously reported features of wild-type development as well as developmental delay in a Drosophila model of a human disease. The scalability of the presented design makes it readily suitable for large-scale longitudinal studies in multiple developmental contexts.

Show Abstract

The open-source Masala software suite: Facilitating rapid methods development for synthetic heteropolymer design

Tristan Zaborniak, B. Turzo, D. Renfrew, V. Mulligan, et al.

Although canonical protein design has benefited from machine learning methods trained on databases of protein sequences and structures, synthetic heteropolymer design still relies heavily on physics-based methods. The Rosetta software, which provides diverse physics-based methods for designing sequences, exploring conformations, docking molecules, and performing analysis, has proven invaluable to this field. Nevertheless, Rosetta’s aging architecture, monolithic structure, non-open source code, and steep development learning curve are beginning to hinder new methods development. Here, we introduce the Masala software suite, a free, open-source set of C++ libraries intended to extend Rosetta and other software, and ultimately to be a successor to Rosetta. Masala is structured for modern computing hardware, and its build system automates the creation of application programming interface (API) layers, permitting Masala’s use as an extension library for existing software, including Rosetta. Masala features modular architecture in which it is easy for novice developers to add new plugin modules, which can be independently compiled and loaded at runtime, extending functionality of software linking Masala without source code alteration. Here, we describe implementation of Masala modules that accelerate protein and synthetic peptide design. We describe the implementation of Masala real-valued local optimizers and cost function network optimizers that can be used as drop-in replacements for Rosetta’s minimizer and packer when designing heteropolymers. We explore design-centric guidance terms for promoting desirable features, such as hydrogen bond networks, or discouraging undesirable features, such as unsatisfied buried hydrogen bond donors and acceptors, which we have re-implemented far more efficiently in Masala, providing up to two orders of magnitude of speedup in benchmarks. Finally, we discuss development goals for future versions of Masala.

Show Abstract

Decomposition of phenotypic heterogeneity in autism reveals underlying genetic programs

Aviya Litman, N. Sauerwald, C. Park, O. Troyanskaya, et al.

Unraveling the phenotypic and genetic complexity of autism is extremely challenging yet critical for understanding the biology, inheritance, trajectory and clinical manifestations of the many forms of the condition. Using a generative mixture modeling approach, we leverage broad phenotypic data from a large cohort with matched genetics to identify robust, clinically relevant classes of autism and their patterns of core, associated and co-occurring traits, which we further validate and replicate in an independent cohort. We demonstrate that phenotypic and clinical outcomes correspond to genetic and molecular programs of common, de novo and inherited variation and further characterize distinct pathways disrupted by the sets of mutations in each class. Remarkably, we discover that class-specific differences in the developmental timing of affected genes align with clinical outcome differences. These analyses demonstrate the phenotypic complexity of children with autism, identify genetic programs underlying their heterogeneity, and suggest specific biological dysregulation patterns and mechanistic hypotheses.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates