CCB: Publications

Decomposition of phenotypic heterogeneity in autism reveals distinct and coherent genetic programs

Aviya Litman, N. Sauerwald, C. Park, Y. Hao, O. Troyanskaya, et al.

Unraveling the phenotypic and genetic complexity of autism is extremely challenging yet critical for understanding the biology, inheritance, trajectory, and clinical manifestations of the many forms of the condition. Here, we leveraged broad phenotypic data from a large cohort with matched genetics to characterize classes of autism and their patterns of core, associated, and co-occurring traits, ultimately demonstrating that phenotypic patterns are associated with distinct genetic and molecular programs. We used a generative mixture modeling approach to identify robust, clinically-relevant classes of autism which we validate and replicate in a large independent cohort. We link the phenotypic findings to distinct patterns of de novo and inherited variation which emerge from the deconvolution of these genetic signals, and demonstrate that class-specific common variant scores strongly align with clinical outcomes. We further provide insights into the distinct biological pathways and processes disrupted by the sets of mutations in each class. Remarkably, we discover class-specific differences in the developmental timing of genes that are dysregulated, and these temporal patterns correspond to clinical milestone and outcome differences between the classes. These analyses embrace the phenotypic complexity of children with autism, unraveling genetic and molecular programs underlying their heterogeneity and suggesting specific biological dysregulation patterns and mechanistic hypotheses.

Show Abstract

CryoBench: Diverse and challenging datasets for the heterogeneity problem in cryo-EM

Minkyu Jeon, M. Astore, S. Hanson, P. Cossio, et al.

Cryo-electron microscopy (cryo-EM) is a powerful technique for determining high-resolution 3D biomolecular structures from imaging data. As this technique can capture dynamic biomolecular complexes, 3D reconstruction methods are increasingly being developed to resolve this intrinsic structural heterogeneity. However, the absence of standardized benchmarks with ground truth structures and validation metrics limits the advancement of the field. Here, we propose CryoBench, a suite of datasets, metrics, and performance benchmarks for heterogeneous reconstruction in cryo-EM. We propose five datasets representing different sources of heterogeneity and degrees of difficulty. These include conformational heterogeneity generated from simple motions and random configurations of antibody complexes and from tens of thousands of structures sampled from a molecular dynamics simulation. We also design datasets containing compositional heterogeneity from mixtures of ribosome assembly states and 100 common complexes present in cells. We then perform a comprehensive analysis of state-of-the-art heterogeneous reconstruction tools including neural and non-neural methods and their sensitivity to noise, and propose new metrics for quantitative comparison of methods. We hope that this benchmark will be a foundational resource for analyzing existing methods and new algorithmic development in both the cryo-EM and machine learning communities.

Show Abstract

Consider the power of interdisciplinary collaborations to improve embryo selection

D. Needleman, Catherine Racowsky, Ph.D.

Unraveling the Molecular Complexity of N-Terminus Huntingtin Oligomers: Insights into Polymorphic Structures

Neha Nanajkar, A. Sahoo, Silvina Matysiak

Huntington’s disease (HD) is a fatal neurodegenerative disorder resulting from an abnormal expansion of polyglutamine (polyQ) repeats in the N-terminus of the huntingtin protein. When the polyQ tract surpasses 35 repeats, the mutated protein undergoes misfolding, culminating in the formation of intracellular aggregates. Research in mouse models suggests that HD pathogenesis involves the aggregation of N-terminal fragments of the huntingtin protein (htt). These early oligomeric assemblies of htt, exhibiting diverse characteristics during aggregation, are implicated as potential toxic entities in HD. However, a consensus on their specific structures remains elusive. Understanding the heterogeneous nature of htt oligomers provides crucial insights into disease mechanisms, emphasizing the need to identify various oligomeric conformations as potential therapeutic targets. Employing coarse-grained molecular dynamics, our study aims to elucidate the mechanisms governing the aggregation process and resultant aggregate architectures of htt. The polyQ tract within htt is flanked by two regions: an N-terminal domain (N17) and a short C-terminal proline-rich segment. We conducted self-assembly simulations involving five distinct N17 + polyQ systems with polyQ lengths ranging from 7 to 45, utilizing the ProMPT force field. Prolongation of the polyQ domain correlates with an increase in β-sheet-rich structures. Longer polyQ lengths favor intramolecular β-sheets over intermolecular interactions due to the folding of the elongated polyQ domain into hairpin-rich conformations. Importantly, variations in polyQ length significantly influence resulting oligomeric structures. Shorter polyQ domains lead to N17 domain aggregation, forming a hydrophobic core, while longer polyQ lengths introduce a competition between N17 hydrophobic interactions and polyQ polar interactions, resulting in densely packed polyQ cores with outwardly distributed N17 domains. Additionally, at extended polyQ lengths, we observe distinct oligomeric conformations with varying degrees of N17 bundling. These findings can help explain the toxic gain-of-function that htt with expanded polyQ acquires.

Show Abstract

Computational tools for cellular scale biophysics

D. Stein, M. Shelley

Mathematical models are indispensable for disentangling the interactions through which biological components work together to generate the forces and flows that position, mix, and distribute proteins, nutrients, and organelles within the cell. To illuminate the ever more specific questions studied at the edge of biological inquiry, such models inevitably become more complex. Solving, simulating, and learning from these more realistic models requires the development of new analytic techniques, numerical methods, and scalable software. In this review, we discuss some recent developments in tools for understanding how large numbers of cytoskeletal filaments, driven by molecular motors and interacting with the cytoplasm and other structures in their environment, generate fluid flows, instabilities, and material deformations which help drive crucial cellular processes.

Show Abstract

Deciphering missense coding variants with AlphaMissense

Z. Pan, Chandra L. Theesfeld

Genetic diagnosis promises to guide treatment and manage expectations for patients and physicians. Yet even when a variant in a disease gene is identified, the assignment of pathogenic impact is not always possible.1 Of the 215 million possible substitutions in approximately 19,900 genes, 71 million are missense mutations that result in an amino acid substitution rather than a stop codon or a frameshift.2 Only 4 million missense variants have been observed, of which approximately 2% have been clinically classified as pathogenic or benign by testing companies and collected in the public ClinVar repository. The rest are classified as variants of uncertain significance (VUS) due to the dearth of information on the functional impact or pathogenic consequences of the mutation.
A key challenge is to understand how changes in protein sequence affect function and contribute to disease. While the development of mutational scanning assays enables scientists to test thousands of substitutions at a time in cell lines, it is not possible to experimentally test all mutations, let alone assess fitness in humans. To meet this challenge, computational approaches that integrate many types of information and can predict functional impacts are becoming increasingly more sophisticated in their ability to accurately classify variants.
The early and powerful strategy for modeling the pathogenic impacts of variants involved employing evolutionary sequence information through the use of multiple sequence alignments (MSA). This approach examines sequence conservation across species and within humans, as demonstrated in models like PolyPhen and SIFT.3 The integration of functional insights related to protein domains and functions further enhances these models, coupled with artificial intelligence.3 Prediction of a correct 3-dimensional protein structure has long been a grail in research. Marks et al.4 suggested a global statistical model to massively reduce the search space of protein conformations by linking the pairwise correlations from MSA to fold a protein into a correct 3-dimensional structure (directly from Marks et al.4). AlphaFold5 marked a significant advancement in the field by using a large language model (LLM) to associate protein structure with MSA with unprecedented accuracy, effectively solving the “protein folding problem.” The ability of protein LLMs to learn not just amino acid relationships in linear sequences but also extremely rich relationships in any number of dimensions and contexts powers such models.

Show Abstract

Dynamical arrest in active nematic turbulence

I. Lavi, Ricard Alert, et al.

Active fluids display spontaneous turbulent-like flows known as active turbulence. Recent work revealed that these flows have universal features, independent of the material properties and of the presence of topological defects. However, the differences between defect-laden and defect-free active turbulence remain largely unexplored. Here, by means of large-scale numerical simulations, we show that defect-free active nematic turbulence can undergo dynamical arrest. We find that flow alignment -- the tendency of nematics to reorient under shear -- enhances large-scale jets in contractile rodlike systems while promoting arrested flow patterns in extensile systems. Our results reveal a mechanism of labyrinthine pattern formation produced by an emergent topology of nematic domain walls that partially suppresses chaotic flows. Taken together, our findings call for the experimental realization of defect-free active nematics, and suggest that topological defects enable turbulence by preventing dynamical arrest.

Show Abstract

Cytoplasmic stirring by active carpets

B. Chakrabarti, M. Rachh, S. Shvartsman, M. Shelley

Large cells often rely on cytoplasmic flows for intracellular transport, maintaining homeostasis, and positioning cellular components. Understanding the mechanisms of these flows is essential for gaining insights into cell function, developmental processes, and evolutionary adaptability. Here, we focus on a class of self-organized cytoplasmic stirring mechanisms that result from fluid–structure interactions between cytoskeletal elements at the cell cortex. Drawing inspiration from streaming flows in late-stage fruit fly oocytes, we propose an analytically tractable active carpet theory. This model deciphers the origins and three-dimensional spatiotemporal organization of such flows. Through a combination of simulations and weakly nonlinear theory, we establish the pathway of the streaming flow to its global attractor: a cell-spanning vortical twister. Our study reveals the inherent symmetries of this emergent flow, its low-dimensional structure, and illustrates how complex fluid–structure interaction aligns with classical solutions in Stokes flow. This framework can be easily adapted to elucidate a broad spectrum of self-organized, cortex-driven intracellular flows.

Show Abstract

E. coli do not count single molecules

H. Mattingly, Keita Kamino, Jude Ong, et al.

Organisms must perform sensory-motor behaviors to survive. What bounds or constraints limit behavioral performance? Previously, we found that the gradient-climbing speed of a chemotaxing Escherichia coli is near a bound set by the limited information they acquire from their chemical environments (1). Here we ask what limits their sensory accuracy. Past theoretical analyses have shown that the stochasticity of single molecule arrivals sets a fundamental limit on the precision of chemical sensing (2). Although it has been argued that bacteria approach this limit, direct evidence is lacking. Here, using information theory and quantitative experiments, we find that E. coli’s chemosensing is not limited by the physics of particle counting. First, we derive the physical limit on the behaviorally-relevant information that any sensor can get about a changing chemical concentration, assuming that every molecule arriving at the sensor is recorded. Then, we derive and measure how much information E. coli’s signaling pathway encodes during chemotaxis. We find that E. coli encode two orders of magnitude less information than an ideal sensor limited only by shot noise in particle arrivals. These results strongly suggest that constraints other than particle arrival noise limit E. coli’s sensory fidelity.

Show Abstract

Yardangs sculpted by erosion of heterogeneous material

Samuel Boury, S. Weady, Leif Ristroph, et. al.

The recognizable shapes of landforms arise from processes such as erosion by wind or water currents. However, explaining the physical origin of natural structures is challenging due to the coupled evolution of complex flow fields and three-dimensional (3D) topographies. We investigate these issues in a laboratory setting inspired by yardangs, which are raised, elongate formations whose characteristic shape suggests erosion of heterogeneous material by directional flows. We combine experiments and simulations to test an origin hypothesis involving a harder or less erodible inclusion embedded in an outcropping of softer material. Optical scans of clay objects fixed within flowing water reveal a transformation from a featureless mound to a yardang-like form resembling a lion in repose. Phase-field simulations reproduce similar shape dynamics and show their dependence on the erodibility contrast and flow strength. Through visualizations of the flow fields and analysis of the local erosion rate, we identify effects associated with flow funneling and the turbulent wake that are responsible for carving the unique geometrical features. This highly 3D scouring process produces complex shapes from simple and commonplace starting conditions and is thus a candidate explanation for natural yardangs. The methods introduced here should be generally useful for geomorphological problems and especially those for which material heterogeneity is a primary factor.

Show Abstract