CCB: Publications

Inference of Bacterial Small RNA Regulatory Networks and Integration with Transcription Factor-Driven Regulatory Networks

M Arrieta Ortiz, C Hafemeister, B Shuster, N Baliga, R. Bonneau

Small noncoding RNAs (sRNAs) are key regulators of bacterial gene expression. Through complementary base pairing, sRNAs affect mRNA stability and translation efficiency. Here, we describe a network inference approach designed to identify sRNA-mediated regulation of transcript levels. We use existing transcriptional data sets and prior knowledge to infer sRNA regulons using our network inference tool, the Inferelator. This approach produces genome-wide gene regulatory networks that include contributions by both transcription factors and sRNAs. We show the benefits of estimating and incorporating sRNA activities into network inference pipelines using available experimental data. We also demonstrate how these estimated sRNA regulatory activities can be mined to identify the experimental conditions where sRNAs are most active. We uncover 45 novel experimentally supported sRNAmRNA interactions in Escherichia coli, outperforming previous network-based efforts. Additionally, our pipeline complements sequence-based sRNA-mRNA interaction prediction methods by adding a data-driven filtering step. Finally, we show the general
applicability of our approach by identifying 24 novel, experimentally supported, sRNA-mRNA interactions in Pseudomonas aeruginosa, Staphylococcus aureus, and Bacillus subtilis. Overall, our strategy generates novel insights into the functional context of sRNA regulation in multiple bacterial species.

Show Abstract

Macromolecular modeling and design in Rosetta: recent methods and frameworks

J. Koehler, B Weitzner, R. Bonneau

Trapping, gliding, vaulting: transport of semiflexible polymers in periodic post arrays

B. Chakrabarti, C. Gaillard, D. Saintillan

The transport of deformable particles through porous media underlies a wealth of applications ranging from filtration to oil recovery to the transport and spreading of biological agents. Using direct numerical simulations, we analyze the dynamics of semiflexible polymers under the influence of an imposed flow in a structured two-dimensional lattice serving as an idealization of a porous medium. This problem has received much attention in the limit of reptation and for long-chain polymer molecules such as DNA that are transported through micropost arrays for electrophoretic chromatographic separation. In contrast to long entropic molecules, the dynamics of elastic polymers results from a combination of scattering with the obstacles and flow-induced buckling instabilities. We identify three dominant modes of transport that involve trapping, gliding and vaulting of the polymers around the obstacles, and we reveal their essential features using tools from dynamical systems theory. The interplay of these scattering dynamics with transport and deformations in the imposed flow results in the long-time asymptotic dispersion of the center of mass, which we quantify in terms of a hydrodynamic dispersion tensor. We then discuss a simple yet efficient chromatographic device that exploits the competition between different modes of transport to sort filaments in a dilute suspension according to their lengths.

Show Abstract

Genome-wide landscape of RNA-binding protein dysregulation reveals a major impact on psychiatric disorder risk

C. Park, J Zhou, A. Wong, K. Chen, C Theesfeld, R Darnell, O. Troyanskaya

Despite the strong genetic basis of psychiatric disorders, the molecular origins of these diseases are still largely unmapped. RNA-binding proteins (RBPs) are responsible for most post-transcriptional regulation, from splicing to translational to localization. RBPs thus act as key gatekeepers of cellular homeostasis, especially in the brain. Here, we leverage a deep learning approach to interrogate variant effects genome-wide, and discover that the dysregulation of RBP target sites is a principal contributor to psychiatric disorder risk. We show that specific modes of RBP regulation are genetically linked to the heritability of psychiatric disorders, and demonstrate that diverse RBP regulatory functions are reflected in distinct genome-wide negative selection signatures. Notably, RBP dysregulation has a stronger impact on psychiatric disorders than common coding region variants and explains heritability not currently captured by large-scale molecular QTL studies (expression QTLs and splicing QTLs). We share genome-wide profiles of RBP target site dysregulation, which we used to identify DDHD2 as a candidate schizophrenia risk gene, in a public web server. This resource provides a novel analytical framework to connect the full range of RNA regulation to complex disease.

Show Abstract

Excess dNTPs Trigger Oscillatory Surface Flow in the Early Drosophila Embryo

S. Dutta, N. Djabrayan, C. Smits, C. Rowley, S. Shvartsman

During the first 2 hours of Drosophila development, precisely orchestrated nuclear cleavages, cytoskeletal rearrangements, and directed membrane growth lead to the formation of an epithelial sheet around the yolk. The newly formed epithelium remains relatively quiescent during the next hour as it is patterned by maternal inductive signals and zygotic gene products. We discovered that this mechanically quiet period is disrupted in embryos with high levels of dNTPs, which have been recently shown to cause abnormally fast nuclear cleavages and interfere with zygotic transcription. High levels of dNTPs are associated with robust onset of oscillatory two-dimensional flows during the third hour of development. Tissue cartography, particle image velocimetry, and dimensionality reduction techniques reveal that these oscillatory flows are low dimensional and are characterized by the presence of spiral vortices. We speculate that these aberrant flows emerge through an instability triggered by deregulated mechanical coupling between the nascent epithelium and three-dimensional yolk. These results highlight an unexplored connection between a core metabolic process and large-scale mechanics in a rapidly developing embryo.

Show Abstract

Microtubule re-organization during female meiosis in C. elegans

Ina Lantzsch, Che-Hang Yu, Hossein Yazdkhasti, Norbert Lindow, Erik Szentgyörgyi, Steffen Prohaska, Martin Srayko, S. Fürthauer, Stefanie Redmann

The female meiotic spindles of most animals are acentrosomal and undergo drastic morphological changes while transitioning from metaphase to anaphase. The ultra-structure of acentrosomal spindles, and how this enables such dramatic rearrangements remains largely unknown. To address this, we applied light microscopy, large-scale electron tomography and mathematical modeling of female meiotic C. elegans spindles undergoing the transition from metaphase to anaphase. Combining these approaches, we find that meiotic spindles are dynamic arrays of short microtubules that turn over on second time scales. The results show that the transition from metaphase to anaphase correlates with an increase in the number of microtubules and a decrease of their average length. To understand the mechanisms that drive this transition, we developed a mathematical model for the microtubule length distribution that considers microtubule growth, catastrophe, and severing. Using Bayesian inference to compare model predictions and data, we find that microtubule turn-over is the major driver of the observed large-scale reorganizations. Our data suggest that cutting of microtubules occurs, but that most microtubules are not severed before undergoing catastrophe.

Show Abstract

Better together: Elements of successful scientific software development in a distributed collaborative community

J. Koehler, B Weitzner, D. Renfrew, S Lewis, R Moretti, A Watkins, V. Mulligan, S Lyskov, J Adolf-Bryfogle, J Labonte, J Krys, Rosetta Commons Consortium, W Schief, D Gront, O Schueler-Furman, D Baker, J Gray, R Dunbrack, T Kortemme, A Leaver-Fay, C Strauss, J Meiler, B Kuhlman, J Gray , R. Bonneau

Many scientific disciplines rely on computational methods for data analysis, model generation, and prediction. Implementing these methods is often accomplished by researchers with domain expertise but without formal training in software engineering or computer science. This arrangement has led to underappreciation of sustainability and maintainability of scientific software tools developed in academic environments. Some software tools have avoided this fate, including the scientific library Rosetta. We use this software and its community as a case study to show how modern software development can be accomplished successfully, irrespective of subject area. Rosetta is one of the largest software suites for macromolecular modeling, with 3.1 million lines of code and many state-of-the-art applications. Since the mid 1990s, the software has been developed collaboratively by the RosettaCommons, a community of academics from over 60 institutions worldwide with diverse backgrounds including chemistry, biology, physiology, physics, engineering, mathematics, and computer science. Developing this software suite has provided us with more than two decades of experience in how to effectively develop advanced scientific software in a global community with hundreds of contributors. Here we illustrate the functioning of this development community by addressing technical aspects (like version control, testing, and maintenance), community-building strategies, diversity efforts, software dissemination, and user support. We demonstrate how modern computational research can thrive in a distributed collaborative community. The practices described here are independent of subject area and can be readily adopted by other software development communities.

Show Abstract

DeepArk: modeling cis-regulatory codes of model species with deep learning

E Cofer, J Raimundo, A Tadych, Y Yamazaki, A. Wong, C Theesfeld, M Levine, O. Troyanskaya

To enable large-scale analyses of regulatory logic in model species, we developed DeepArk (https://DeepArk.princeton.edu), a set of deep learning models of the cis-regulatory codes of four widelystudied species: Caenorhabditis elegans, Danio rerio, Drosophila melanogaster, and Mus musculus. DeepArk accurately predicts the presence of thousands of different context-specific regulatory features, including chromatin states, histone marks, and transcription factors. In vivo studies show that DeepArk can predict the regulatory impact of any genomic variant (including rare or not previously observed), and enables the regulatory annotation of understudied model species.

Show Abstract

Spatial Transcriptional Mapping of the Human Nephrogenic Program

N Lindström, R. Sealfon, X. Chen, R Parvez, A Ransick, G De Sena Brandine, J Guo, B Hill, T Tran, A Kim, J Zhou, A Tadych, A. Watters, A. Wong, E. Lovero, B Grubbs, M Thornton, J McMahon, A Smith, S Ruffins , C Armit, O. Troyanskaya, A McMahon

defects affecting 3% of newborns. The human kidney develops over a 30-week period in which a nephron progenitor pool gives rise to around a million nephrons. To establish a framework for human nephrogenesis, we spatially resolved a stereotypical process by which equipotent nephron progenitors generate a nephron anlagen, then applied data-driven approaches to construct three-dimensional protein maps on anatomical models of the nephrogenic program. Single cell RNA sequencing identified novel progenitor states which were spatially mapped to the nephron anatomy enabling the generation of functional gene-networks predicting interactions within and between nephron cell-types. Network mining identified known developmental disease genes and predicts new targets of interest. The spatially resolved nephrogenic program made available through the Human Nephrogenesis Atlas (https://sckidney.flatironinstitute.org/) will facilitate an understanding of kidney development and disease, and enhance efforts to generate new kidney structures.

Show Abstract

Visualizing ’omic feature rankings and log-ratios using Qurro

M. Fedarko, C. Martino, J. Morton, A. González, G. Rahman, C. Marotz, J. Minich, E. Allen, R. Knight

Many tools for dealing with compositional ‘ ’omics’ data produce feature-wise values that can be ranked in order to describe features’ associations with some sort of variation. These values include differentials (which describe features’ associations with specified covariates) and feature loadings (which describe features’ associations with variation along a given axis in a biplot). Although prior work has discussed the use of these ‘rankings’ as a starting point for exploring the log-ratios of particularly high- or low-ranked features, such exploratory analyses have previously been done using custom code to visualize feature rankings and the log-ratios of interest. This approach is laborious, prone to errors and raises questions about reproducibility. To address these problems we introduce Qurro, a tool that interactively visualizes a plot of feature rankings (a ‘rank plot’) alongside a plot of selected features’ log-ratios within samples (a ‘sample plot’). Qurro’s interface includes various controls that allow users to select features from along the rank plot to compute a log-ratio; this action updates both the rank plot (through highlighting selected features) and the sample plot (through displaying the current log-ratios of samples). Here, we demonstrate how this unique interface helps users explore feature rankings and log-ratios simply and effectively.

Show Abstract