NSF-Simons Southeast Center for Mathematics and Biology – Georgia Institute of Technology
- Christine Heitsch — PI/Director
- Hang Lu — Co-I/Associate Director
NSF-Simons Center for Mathematical and Statistical Analysis of Biology – Harvard University
- Andrew Murray— PI/Chair and Center Director
NSF-Simons Center for Multiscale Cell Fate Research – University of California, Irvine
- Qing Nie – PI/Director
NSF-Simons Center for Quantitative Biology – Northwestern University
- Richard Carthew — PI/Center Director
- Jun Allard, University of California, Irvine
- Natasha Jonoska, University of South Florida
- William Kath, Northwestern University
- Carole LaBonne, Northwestern University
- Samantha Petti, Harvard University
- Maksim Pilkus, University of California, Irvine
- Vardhan Satalkar, Georgia Tech
- Krishna Shrinivas, Harvard University
- Francesca Storici, Georgia Tech
- Matt Torres, Georgia Tech
The NSF-Simons Research Centers for Mathematics of Complex Biological Systems (MathBioSys) initiative created innovative, collaborative research centers at the intersection of mathematics and molecular, cellular, and organismal biology, establishing new connections between mathematical sciences and biological sciences and promoting interdisciplinary education and workforce training. The annual meeting will cover recent progress and provide an arena for discussion and engagement.
Thursday, April 13th
9:30 AM Jun Allard | Biophysics of T cell activation on fluid dynamic, gene regulatory, and ultra-slow timescales 11:00 AM Maksim Pilkus | Spatial-Temporal Control of Tissue Regeneration in the Skin 1:00 PM Krishna Shrinivas | The Many Phases of a Cell 2:30 PM Samantha Petti | End-to-end learning of multiple sequence alignments with differentiable Smith-Waterman 4:00 PM Carole LaBonne | Quantitative analysis of transcriptome dynamics provides novel insights into developmental state transitions
Friday, April 14th
9:30 AM William Kath | Quantifying dynamical changes and statistics of sparse, noisy, high-dimensional genomics data 11:00 AM Matt Torres and Vardhan Satalkar | Illuminating Protein Phosphorylation-mediated Secondary Structure Transitions with Generative Adversarial Networks 1:00 PM Natasha Jonoska and Francesca Storici | RNA-mediated double-strand break repair in human cells
University of California, Irvine
Biophysics of T cell activation on fluid dynamic, gene regulatory, and ultra-slow timescales
A precise mathematical understanding of immune cells, especially T cells, is of increasing importance due, in part, to their role in cell-based immunotherapies. First, to understand the system at a cellular level, we combine precisely controlled time series data with neural net differential equations in which individual terms in the dynamical system are universal neural nets. This approach allows learning on intermediate data size, obviates the bias arising from choosing functional forms as in traditional differential equation modeling, and yet reveals the signaling network architecture. Second, to understand the initial steps of T cell contact with an antigen-presenting cell, we develop a novel computational fluid dynamic scheme, revealing a hydrodynamic thin-layer effect at the interface between the two cells that might be exploited by cytotoxic “killer” T cells. Finally, we use a stochastic rare event algorithm based a weighted ensemble of simulations to model the long-timescale rearrangement of surface molecules on the T cell (driven primarily by diffusion), and infer the consequences of these for T cell signaling. The T cell thus provides opportunities for novel mathematical, computational and biophysical phenomena across multiple timescales.
Quantifying dynamical changes and statistics of sparse, noisy, high-dimensional genomics data
The circadian clock orchestrates a vast array of behavioral and physiological processes with a 24-hour cycle, enabling organisms to anticipate and adapt to the Earth’s day. Entrainable by environmental cues, the rhythm itself is generated by a self-sustained molecular oscillator present in nearly every cell that governs the expression of thousands of genes, precisely coordinating biological processes at the microscopic scale in a manner that is both reliable yet flexible enough to adapt to environmental changes. Today, high-throughput omics assays enable us to probe these mechanisms in molecular detail, with the goal of making inferences about which genes are under circadian control and how their dynamics change under different conditions. Yet analyzing this data raises new challenges: that of characterizing dynamics when the data are noisy, sparsely sampled in time, and may not be strictly periodic. Current methods make unrealistic assumptions (such as stationarity of residuals) or rely on “template” waveforms that limit the scope of discovery. In this talk, I will discuss our recent work on nonparametric methods to analyze circadian transcriptomic data and overcome these challenges by exploiting and extending results from dynamical systems theory and topological data analysis.
Quantitative analysis of transcriptome dynamics provides novel insights into developmental state transitions
During embryogenesis, the developmental potential of initially pluripotent cells becomes progres-sively restricted as they transit to lineage restricted states. The pluripotent cells of Xenopus blastula-stage embryos are an ideal system in which to study cell state transitions during developmental decision-making, as gene expression dynamics and chromatin state can be followed at high temporal resolution. We have develop an experimental platform and quantitative framework in which pluripotent cells explanted from blastula stage Xenopus embryos can be used to study the transit of these cells to many distinct lineage states. We have used this system to study transcriptome dynamics during the transition to for different lineage states:—epidermis, neural progenitor, endoderm and ventral mesoderm— each across six time points during this seven-hour process, yielding 72 observations in a 45,661-dimensional gene expression space. These data provide quantitative insights into the dynamics of Waddington’s landscape, help explain why the neural progenitor state is the default lineage state for pluripotent cells, and identify novel components of lineage-specific gene regulation. These data also reveal an unexpected overlap in the transcriptional responses to BMP4/7 and Activin signaling and provide mechanistic insight into how the timing of signaling inputs such as BMP are temporally controlled to ensure correct lineage decisions during development.
Georgia Institute of Technology
Georgia Institute of Technology
Illuminating Protein Phosphorylation-mediated Secondary Structure Transitions with Generative Adversarial Networks
Protein structure and protein function are inextricably linked. Emerging evidence suggests that enzymatically catalyzed protein modifications in regions that lack a defined structure (Intrinsically Disordered Regions; IDRs) can regulate protein function by promoting disorder-to-order (D/O) transitions that alter normal protein function. However, the physico-chemical properties that enable such regulation are unknown. To better understand this process, we are developing models to identify IDRs able to undergo D/O transitions following phosphorylation, a type of protein modification.
In phase I of the project we developed a 3-class machine learning discriminator for protein secondary structures, along with a 2-class Generative Adversarial Network (GAN) for the design of beta-hairpin secondary structures. Predictions from the GAN model were validated unique both from the training data and all other protein sequences contained within the comprehensive NCBI database. Several generated sequences were also evaluated using molecular dynamics simulations and experimental circular dichroism analysis. In phase II of the project, we are developing an updated model able to design peptides able to undergo phosphorylation-mediate D/O transitions. Due to the lack of significant experimental data for such transitions, we are developing a synthetic training data set using molecular dynamics and high-performance computing.
University of South Florida
RNA-mediated double-strand break repair in human cells
Double-strand breaks (DSBs) in DNA are challenging lesions to repair. Human cells employ at least three DSB repair mechanisms, with a preference for non-homologous end joining (NHEJ) over homologous recombination (HR) and microhomology-mediated end joining (MMEJ). In contrast to HR, NHEJ and MMEJ do not utilize a DNA template molecule to recover damaged and/or lost nucleotides. NHEJ directly ligates broken DNA ends, while MMEJ exploits the alignment of short microhomologies on the DSB sides and is associated with deletions of the sequence between the microhomologies. It is unknown whether and to what extent a transcript RNA has a direct role in DSB-repair mechanisms in mammalian cells. Here, we show that both sense and antisense transcript RNA facilitates DSB repair in a sequence-specific manner in human cells. Depending on its sequence complementarity with the broken DNA ends, the transcript RNA could promote the repair of a DSB or gap in its DNA gene via NHEJ or MMEJ, or mediate RNA-templated repair. The transcript RNA influences DSB repair by NHEJ and MMEJ even when the transcription level is low. The results demonstrate an unexpected role of transcript RNA in directing the way DSBs are repaired in human cells and maintaining genome stability.
Youngkyu Jeon1, Margherita Maria Ferrari2,3,6, Tejasvi Channagiri2,6, Penghao Xu1,6, Sathya Balachander1,4, Vivian S. Park5, Stefania Marsili1, Zachary F. Pursell5, Nataša Jonoska2 & Francesca Storici1
1School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia; 2Department of Mathematics and Statistics, University of South Florida, Tampa, Florida; 3currently at Department of Mathematics, University of Manitoba, Winnipeg, Canada; 4currently at Emory University, Atlanta, Georgia; 5Department of Biochemistry and Molecular Biology, Tulane Cancer Center, Tulane University of Medicine, New Orleans, Louisiana; 6these authors contributed equally.
End-to-end learning of multiple sequence alignments with differentiable Smith-Waterman
Multiple Sequence Alignments (MSAs) of homologous sequences contain information on structural and functional constraints and their evolutionary histories. Despite their importance for many downstream tasks, such as structure prediction, MSA generation is often treated as a separate pre-processing step, without any guidance from the application it will be used for. Here, we implement a smooth and differentiable version of the Smith-Waterman pairwise alignment algorithm that enables jointly learning an MSA and a downstream machine learning system in an end-to-end fashion. To demonstrate its utility, we introduce SMURF (Smooth Markov Unaligned Random Field), a new method that jointly learns an alignment and the parameters of a Markov Random Field for unsupervised contact prediction. We find that SMURF learns MSAs that mildly improve contact prediction on a diverse set of protein and RNA families. As a proof of concept, we demonstrate that by connecting our differentiable alignment module to AlphaFold and maximizing predicted confidence, we can learn MSAs that improve structure predictions over the initial MSAs. Interestingly, the alignments that improve AlphaFold predictions are self-inconsistent and can be viewed as adversarial. This work highlights the potential of differentiable dynamic programming to improve neural network pipelines that rely on an alignment and the potential dangers of relying on black-box methods for optimizing predictions of protein sequences.
University of California, Irvine
Spatial-Temporal Control of Tissue Regeneration in the Skin
Regeneration is a fundamental property of adult tissues, that enables them to restore anatomical integrity and functional competence lost to injury or disease. At its maximum, regenerative healing restores tissues to their embryonic replica. However, in many animal species regenerative healing is only partial, and competes with an alternative healing via scar–an anatomical and functional substitute tissue. The mechanism of regeneration and how it competes with the mechanism of scarring is complex, involving numerous epigenetic and signaling components, that are dynamic (i) across numerous cell types, (ii) time of healing and (iii) tissue space. We use skin as the model system for studying regeneration with a combination of computational modeling, bioinformatic analysis on single-cell data, intravital imaging and functional genetic assays. We use the model of cyclical hair regeneration to study the mechanism of tissue growth control, both in space (e.g., how hair follicles grow to their finite steady-state size) and in time (e.g., how hair follicles control the length of their hair fibre and how they regulate frequency of hair cycles). We also use the model of hair neogenesis, whereas new hair follicles regenerate in skin after deep wounding, to study spatial-temporal installment of embryonic-like signaling patterns for hair morphogenesis and the role of wound-resident fibroblasts and macrophages in this process. Throughout my presentation I will use several examples of integrated use of computational analyses and in vivo experimentations on hair cycle and hair neogenesis to demonstrate how a multi-scale approach enables new insights into tissue regeneration mechanisms.
The Many Phases of a Cell
Cells routinely orchestrate reactions, interactions, and transport amongst billions of biomolecules in a crowded environment to perform the diverse tasks that underpin life. Rather than occurring in a well-mixed milieu, biomolecules self-organize into dozens of membrane-lacking compartments called condensates that enable key biological functions and are aberrant in disease. I will begin my talk by introducing phase transitions as an emerging paradigm underlying condensate assembly and function in cells. I will then highlight challenges that limit our understanding of condensates, which unlike equilibrium oil-water mixtures, are highly multicomponent, multiphasic biomolecular assemblies that are driven out of equilibrium by fluxes and forces. Through specific examples, I will propose an interdisciplinary and collaborative approach that advances our understanding of condensates by bridging non-equilibrium thermodynamics, physical chemistry, and soft matter physics in the cellular context. In the first part of my talk, I will discuss past/ongoing work that links condensates to regulation of gene expression and focus on the emerging role of non-equilibrium RNA synthesis in genome/nuclear organization. In the second part, I will discuss complementary efforts to build quantitative physical frameworks that enable prediction and design of emergent multiphase behavior in multicomponent fluids. I will conclude with a brief summary and discussion of exciting future directions and opportunities for translation.