2795 Publications

Learning Compositional Functions with Transformers from Easy-to-Hard Data

Zixuan Wang, Eshaan Nichani, A. Bietti, Alex Damian, Daniel Hsu, Jason D. Lee, D. Wu

Transformer-based language models have demonstrated impressive capabilities across a range of complex reasoning tasks. Prior theoretical work exploring the expressive power of transformers has shown that they can efficiently perform multi-step reasoning tasks involving parallelizable computations. However, the learnability of such constructions, particularly the conditions on the data distribution that enable efficient learning via SGD, remains an open question. Towards answering this question, we study the learnability of a task called the \emph{$k$-fold composition}, which requires computing an interleaved composition of $k$ input permutations and $k$ hidden permutations, and can be expressed by a transformer with $O(\log k)$ layers. On the negative front, we provide a Statistical Query lower bound showing that any learner which is trained on samples from the $k$-fold composition task and makes polynomially many queries must have sample size exponential in $k$, thus establishing a statistical-computational gap. On the other hand, we show that this function class can be efficiently learned, with runtime and sample complexity polynomial in $k$, by gradient descent on an $O(\log k)$-depth transformer via two different curriculum learning strategies: one in which data consists of $k’$-fold composition functions with $k’ \le k$ presented in increasing order of difficulty, and another in which all data is presented simultaneously. Our work sheds light on the necessity and sufficiency of having both easy and hard examples in the data distribution for transformers to learn complex compositional tasks.

Show Abstract

Variations in neuronal selectivity create efficient representational codes

Our visual capabilities depend on neural response properties in visual areas of our brains. Neurons exhibit a wide variety of selective response properties, but the reasons for this diversity are unknown. Here, we related the distribution of neuronal tuning properties to the information capacity of the population. Our results from theory, simulations, and analysis of recordings from macaque primary visual cortex (V1) reveal that diversity of amplitude and bandwidth drive complementary changes to the representational geometry of a population. Amplitude diversity pushes the centers of the representations further apart, whereas bandwidth heterogeneity decorrelates the center locations. These geometric changes separate out representations for distinct stimuli, creating more efficient encoding. We study how both types of diversity affect the population code for two different perceptual tasks: discrimination and identification. While both types of diversity improve encoding for both tasks, their distinct impacts on geometry make each more beneficial for one of the two tasks. Amplitude diversity impacts coding efficiency more for discrimination than it does for identification, while bandwidth diversity has a stronger impact on identification. These complementary effects indicate the importance of both types of diversity for perception. Finally, because tuning diversity exists across species and brain areas, our results suggest a fundamental neural coding strategy that may be applicable to a wide range of behavior.

Show Abstract

Variations in neuronal selectivity create efficient representational geometries for perception

Our visual capabilities depend on neural response properties in visual areas of our brains. Neurons exhibit a wide variety of selective response properties, but the reasons for this diversity are unknown. Here, we related the distribution of neuronal tuning properties to the information capacity of the population. Our results from theory, simulations, and analysis of recordings from macaque primary visual cortex (V1) reveal that diversity of amplitude and bandwidth drive complementary changes to the representational geometry of a population. Amplitude diversity pushes the centers of the representations further apart, whereas bandwidth heterogeneity decorrelates the center locations. These geometric changes separate out representations for distinct stimuli, creating more efficient encoding. We study how both types of diversity affect the population code for two different perceptual tasks: discrimination and identification. While both types of diversity improve encoding for both tasks, their distinct impacts on geometry make each more beneficial for one of the two tasks. Amplitude diversity impacts coding efficiency more for discrimination than it does for identification, while bandwidth diversity has a stronger impact on identification. These complementary effects indicate the importance of both types of diversity for perception. Finally, because tuning diversity exists across species and brain areas, our results suggest a fundamental neural coding strategy that may be applicable to a wide range of behavior.

Show Abstract

Sequestration of ribosome biogenesis factors in HSV- 1 nuclear aggregates revealed by spatially resolved thermal profiling

Peter J. Metzger , Tavis J. Reed , O. Troyanskaya

Viruses exploit host cell reliance on compartmentalization to facilitate their replication. Herpes simplex virus type 1 (HSV-1) modulates the subcellular localization of host proteins to suppress immune activation, license viral gene expression, and achieve translational shutoff. To spatially resolve dynamic protein-protein interaction (PPI) networks during infection with an immunostimulatory HSV-1 strain, we integrated nuclear/cytoplasmic fractionation with thermal proximity coaggregation analysis (N/C-TPCA). The resulting expanded depth and spatial resolution of PPIs charted compartment-specific assemblies of protein complexes throughout infection. We find that a broader suite of host chaperones than previously anticipated exhibits nuclear recruitment to form condensates known as virus-induced chaperone-enriched (VICE) domains. Monitoring protein and RNA constituents and ribosome activity, we establish that VICE domains sequester ribosome biogenesis factors from ribosomal RNA, accompanying a cell-wide defect in ribosome supply. These findings highlight infection-driven VICE domains as nodes of translational remodeling and demonstrate the utility of N/C-TPCA to study dynamic biological contexts.

Show Abstract

A live-cell biosensor of in vivo receptor tyrosine kinase activity reveals feedback regulation of a developmental gradient

Emily K. Ho , Rebecca P. Kim-Yip , S. Shvartsman, et al.

A lack of tools for detecting receptor activity in vivo has limited our ability to fully explore receptor-level control of developmental patterning. Here, we extend phospho-tyrosine tag (pYtag) biosensors to visualize endogenous receptor tyrosine kinase (RTK) activity in Drosophila. We build biosensors for three RTKs that function across developmental stages and tissues. By characterizing Torso::pYtag during embryonic terminal patterning, we find that Torso activity differs from downstream extracellular signal-regulated kinase (ERK) activity in two surprising ways: Torso activity is narrowly restricted to the poles but produces a broader gradient of ERK and decreases over developmental time, while ERK activity is sustained, an effect mediated by ERK pathway-dependent negative feedback. Our results suggest that a narrow domain of Torso activity, tuned in amplitude by negative feedback, locally activates signaling effectors, which diffuse through the syncytial embryo to form the ERK gradient. Altogether, the results of this work highlight the usefulness of pYtags for investigating receptor-level regulation of developmental patterning.

Show Abstract

Microtubules in Martini: Parameterizing a heterogeneous elastic-network towards a mechanically accurate microtubule

Microtubules are essential cytoskeletal filaments involved in cell motility, division, and intracellular transport, exhibiting complex structural dynamics governed by diverse biophysical factors. Atomistic simulations of microtubule assemblies remain challenging due to their extensive spatiotemporal scales. To address this, we present a multiscale approach combining the primarily top-down Martini 3 coarse-grained (CG) model with an appropriately parameterized heterogeneous elastic network to capture microtubule mechanics and molecular detail efficiently. By iteratively tuning the elastic network, we matched the structural fluctuations of CG heterodimeric building blocks to atomistic reference data, reproducing experimentally consistent mechanical properties. This framework helped us identify stabilizing long-lived interactions between charged C-terminal tails and the folded domain of neighboring tubulin subunits, offering insight into sequence-specific contributions to lattice stability. Our efforts culminated in the construction of a 200 nm microtubule composed of million interaction centers, enabling exploration of large-scale microtubule-associated processes with amino acid-level resolution. This work bridges the gap between molecular specificity and computational scalability, offering a platform for simulating biophysical processes across cellular length and time scales.

Show Abstract

Microtubules in Martini: Parameterizing a heterogeneous elastic-network towards a mechanically accurate microtubule

Microtubules are essential cytoskeletal filaments involved in cell motility, division, and intracellular transport, exhibiting complex structural dynamics governed by diverse biophysical factors. Atomistic simulations of microtubule assemblies remain challenging due to their extensive spatiotemporal scales. To address this, we present a multiscale approach combining the primarily top-down Martini 3 coarse-grained (CG) model with an appropriately parameterized heterogeneous elastic network to capture microtubule mechanics and molecular detail efficiently. By iteratively tuning the elastic network, we matched the structural fluctuations of CG heterodimeric building blocks to atomistic reference data, reproducing experimentally consistent mechanical properties. This framework helped us identify stabilizing long-lived interactions between charged C-terminal tails and the folded domain of neighboring tubulin subunits, offering insight into sequence-specific contributions to lattice stability. Our efforts culminated in the construction of a 200 nm microtubule composed of million interaction centers, enabling exploration of large-scale microtubule-associated processes with amino acid-level resolution. This work bridges the gap between molecular specificity and computational scalability, offering a platform for simulating biophysical processes across cellular length and time scales.

Show Abstract

Spatial-Temporal Pre-Training for Embryo Viability Prediction Using Time-Lapse Videos

Zhiyi Shi, Junsik Kim, D. Needleman, et al.

Automating embryo viability prediction for in vitro fertilization (IVF) is important but challenging due to the limited availability of labeled pregnancy outcome data, as only a small fraction of embryos are labeled after transfer. Self-supervised learning (SSL) can leverage both labeled and unlabeled data to improve prediction. However, existing SSL methods for videos are not directly applicable to embryo development videos due to two challenges: (1) embryo time-lapse videos contain hundreds of frames, requiring significant GPU memory for conventional SSL; (2) the dataset contains videos with varying lengths and many outlier frames, causing traditional video alignment methods to struggle with semantic misalignment. We propose Spatial-Temporal Pre-Training (STPT) to address these challenges. STPT includes two stages: spatial and temporal. In each stage, only one encoder is trained while the other is frozen, reducing memory demands. To handle temporal misalignment, STPT avoids frame-by-frame alignment across videos. The spatial stage learns from alignments within each video and its temporally consistent augmentations. The temporal stage then models relationships between video embeddings. Our method efficiently handles long videos and temporal variability. On 23,027 time-lapse videos (3,286 labeled), STPT achieves the highest AUC of 0.635 (95% CI: 0.632-0.638) compared to baselines, with limited computational resources.

Show Abstract
June 20, 2025

RocketSHP: Ultra-fast Proteome-scale Prediction of Protein Dynamics

Proteins are dynamic molecules that depend on conformational flexibility to carry out functions in the cell, yet despite significant advances in the modeling of static protein structure, prediction of these dynamics remains challenging. We introduce RocketSHP, a machine learning model that predicts dynamic protein properties from sequence or static structure with unprecedented speed and accuracy. Trained on thousands of molecular dynamics trajectories spanning diverse protein families, RocketSHP simultaneously models multiple dynamics features: root-mean-square fluctuations (RMSF), generalized correlation coefficients (GCC-LMI), and a novel structural heterogeneity profile (SHP) based on recent structure quantization methods. RocketSHP significantly outperforms existing methods in predicting simulation-derived dynamics. We reduce RMSF prediction error by 57% compared to BioEmu and calibrated Dyna-1 predictions, including an up to 73% error reduction for long proteins. We validate these predictions with experimental hetNOE data, and we demonstrate the ability to adapt predictions to different physical temperatures. We highlight RocketSHP’s utility in constructing allosteric networks in the oncogene KRAS and identify structural sub-modules with correlated motions, and we validate RocketSHP by showing that changes in node centrality within predicted KRAS allosteric networks correlate with changes of folding free energy in experimental DMS data. Our approach makes predictions in seconds rather than hours or days, enabling us to perform the first comprehensive dynamics analysis of the entire human proteome. RocketSHP bridges the gap between static structural biology and dynamic functional understanding, enabling dynamics-aware structural analysis and variant effect prediction at scales previously unavailable. RocketSHP is available as free and open-source software at https://github.com/flatironinstitute/RocketSHP.

Show Abstract
June 17, 2025

RocketSHP: Ultra-fast Proteome-scale Prediction of Protein Dynamics

Proteins are dynamic molecules that depend on conformational flexibility to carry out functions in the cell, yet despite significant advances in the modeling of static protein structure, prediction of these dynamics remains challenging. We introduce RocketSHP, a machine learning model that predicts dynamic protein properties from sequence or static structure with unprecedented speed and accuracy. Trained on thousands of molecular dynamics trajectories spanning diverse protein families, RocketSHP simultaneously models multiple dynamics features: root-mean-square fluctuations (RMSF), generalized correlation coefficients (GCC-LMI), and a novel structural heterogeneity profile (SHP) based on recent structure quantization methods. RocketSHP significantly outperforms existing methods in predicting simulation-derived dynamics. We reduce RMSF prediction error by 57% compared to BioEmu and calibrated Dyna-1 predictions, including an up to 73% error reduction for long proteins. We validate these predictions with experimental hetNOE data, and we demonstrate the ability to adapt predictions to different physical temperatures. We highlight RocketSHP’s utility in constructing allosteric networks in the oncogene KRAS and identify structural sub-modules with correlated motions, and we validate RocketSHP by showing that changes in node centrality within predicted KRAS allosteric networks correlate with changes of folding free energy in experimental DMS data. Our approach makes predictions in seconds rather than hours or days, enabling us to perform the first comprehensive dynamics analysis of the entire human proteome. RocketSHP bridges the gap between static structural biology and dynamic functional understanding, enabling dynamics-aware structural analysis and variant effect prediction at scales previously unavailable. RocketSHP is available as free and open-source software at https://github.com/flatironinstitute/RocketSHP.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates