CCB: Publications

Bacillus subtilis systems biology

A.R. Bate, R. Bonneau, P. Eichenberger

Endospore-forming bacteria, with Bacillus subtilis being the prevalent model organism, belong to the phylum Firmicutes. Although the last common ancestor of all Firmicutes is likely to have been an endospore-forming species, not every lineage in the phylum has maintained the ability to produce endospores (hereafter, spores). In 1997, the release of the full genome sequence for B. subtilis strain 168 marked the beginning of the genomic era for the study of spore formation (sporulation). In this original genome sequence, 139 of the 4,100 protein-coding genes were annotated as sporulation genes. By the time a revised genome sequence with updated annotations was published in 2009, that number had increased significantly, especially since transcriptional profiling studies (transcriptomics) led to the identification of several genes expressed under the control of known sporulation transcription factors. Over the past decade, genome sequences for multiple spore-forming species have been released (including several strains in the Bacillus anthracis/Bacillus cereus group and many Clostridium species), and phylogenomic analyses have revealed many conserved sporulation genes. Parallel advances in transcriptomics led to the identification of small untranslated regulatory RNAs (sRNAs), including some that are expressed during sporulation. An extended array of -omics techniques, i.e., techniques designed to probe gene function on a genome-wide scale, such as proteomics, metabolomics, and high-throughput protein localization studies, have been implemented in microbiology. Combined with the use of new computational methods for predicting gene function and inferring regulatory relationships on a global scale, these -omics approaches are uncovering novel information about sporulation and a variety of other bacterial cell processes.

Show Abstract

Global Quantitative Modeling of Chromatin Factor Interactions

J. Zhou, O. Troyanskaya

Chromatin is the driver of gene regulation, yet understanding the molecular interactions underlying chromatin factor combinatorial patterns (or the “chromatin codes”) remains a fundamental challenge in chromatin biology. Here we developed a global modeling framework that leverages chromatin profiling data to produce a systems-level view of the macromolecular complex of chromatin. Our model ultilizes maximum entropy modeling with regularization-based structure learning to statistically dissect dependencies between chromatin factors and produce an accurate probability distribution of chromatin code. Our unsupervised quantitative model, trained on genome-wide chromatin profiles of 73 histone marks and chromatin proteins from modENCODE, enabled making various data-driven inferences about chromatin profiles and interactions. We provided a highly accurate predictor of chromatin factor pairwise interactions validated by known experimental evidence, and for the first time enabled higher-order interaction prediction. Our predictions can thus help guide future experimental studies. The model can also serve as an inference engine for predicting unknown chromatin profiles — we demonstrated that with this approach we can leverage data from well-characterized cell types to help understand less-studied cell type or conditions.

Show Abstract

FIREWACh: high-throughput functional detection of transcriptional regulatory modules in mammalian cells

M. Murtha, Z. Tokcaer-Keskin, Z. Tang, F. Strino, X. Chen, Y. Wang, X. Xi, C. Basilico, S. Brown, R. Bonneau, Y. Kluger, L. Dailey

Promoters and enhancers establish precise gene transcription patterns. The development of functional approaches for their identification in mammalian cells has been complicated by the size of these genomes. Here we report a high-throughput functional assay for directly identifying active promoter and enhancer elements called FIREWACh (Functional Identification of Regulatory Elements Within Accessible Chromatin), which we used to simultaneously assess over 80,000 DNA fragments derived from nucleosome-free regions within the chromatin of embryonic stem cells (ESCs) and identify 6,364 active regulatory elements. Many of these represent newly discovered ESC-specific enhancers, showing enriched binding-site motifs for ESC-specific transcription factors including SOX2, POU5F1 (OCT4) and KLF4. The application of FIREWACh to additional cultured cell types will facilitate functional annotation of the genome and expand our view of transcriptional network dynamics.

Show Abstract

Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction

J. Zhou, O. Troyanskaya

Predicting protein secondary structure is a fundamental problem in protein structure prediction. Here we present a new supervised generative stochastic network (GSN) based method to predict local secondary structure with deep hierarchical representations. GSN is a recently proposed deep learning technique (Bengio & Thibodeau-Laufer, 2013) to globally train deep generative model. We present the supervised extension of GSN, which learns a Markov chain to sample from a conditional distribution, and applied it to protein structure prediction. To scale the model to full-sized, high-dimensional data, like protein sequences with hundreds of amino acids, we introduce a convolutional architecture, which allows efficient learning across multiple layers of hierarchical representations. Our architecture uniquely focuses on predicting structured low-level labels informed with both low and high-level representations learned by the model. In our application this corresponds to labeling the secondary structure state of each amino-acid residue. We trained and tested the model on separate sets of non-homologous proteins sharing less than 30% sequence identity. Our model achieves 66.4% Q8 accuracy on the CB513 dataset, better than the previously reported best performance 64.9% (Wang et al., 2011) for this challenging secondary structure prediction problem.

Show Abstract

Semisynthesis of Peptoid–Protein Hybrids by Chemical Ligation at Serine

P.M. Levine, T.W. Craven, R. Bonneau, K. Kirshenbaum

Chemical ligation protocols were explored for generating semisynthetic peptoid–protein hybrid architectures containing a native serine residue at the ligation site. Peptoid oligomers bearing C-terminal salicylaldehyde esters were synthesized and ligated to the N-terminus of the RNase S protein or the therapeutic hormone PTH(1–34) polypeptide. This technique will expand the repertoire of strategies to enable design of hybrid macromolecules with novel structures and functions not accessible to fully biosynthesized proteins.

Show Abstract

Broad Metabolic Sensitivity Profiling of a Prototrophic Yeast Deletion Collection

Benjamin VanderSluis, Ph.D., O. Troyanskaya

Genome-wide sensitivity screens in yeast have been immensely popular following the construction of a collection of deletion mutants of non-essential genes. However, the auxotrophic markers in this collection preclude experiments on minimal growth medium, one of the most informative metabolic environments. Here we present quantitative growth analysis for mutants in all 4,772 non-essential genes from our prototrophic deletion collection across a large set of metabolic conditions.

Show Abstract

Individual and Combined Effects of DNA Methylation and Copy Number Alterations on miRNA Expression in Breast Tumors

M. Aure, O. Troyanskaya

The global effect of copy number and epigenetic alterations on miRNA expression in cancer is poorly understood. In the present study, we integrate genome-wide DNA methylation, copy number and miRNA expression and identify genetic mechanisms underlying miRNA dysregulation in breast cancer.

RESULTS:
We identify 70 miRNAs whose expression was associated with alterations in copy number or methylation, or both. Among these, five miRNA families are represented. Interestingly, the members of these families are encoded on different chromosomes and are complementarily altered by gain or hypomethylation across the patients. In an independent breast cancer cohort of 123 patients, 41 of the 70 miRNAs were confirmed with respect to aberration pattern and association to expression. In vitro functional experiments were performed in breast cancer cell lines with miRNA mimics to evaluate the phenotype of the replicated miRNAs. let-7e-3p, which in tumors is found associated with hypermethylation, is shown to induce apoptosis and reduce cell viability, and low let-7e-3p expression is associated with poorer prognosis. The overexpression of three other miRNAs associated with copy number gain, miR-21-3p, miR-148b-3p and miR-151a-5p, increases proliferation of breast cancer cell lines. In addition, miR-151a-5p enhances the levels of phosphorylated AKT protein.

CONCLUSIONS:
Our data provide novel evidence of the mechanisms behind miRNA dysregulation in breast cancer. The study contributes to the understanding of how methylation and copy number alterations influence miRNA expression, emphasizing miRNA functionality through redundant encoding, and suggests novel miRNAs important in breast cancer.

Show Abstract

Defining Cell-Type Specificity at the Transcriptional Level in Human Disease

W. Ju, W. Ju , C. Greene , F. Eichinger , V. Nair , J. Hodgin , M. Bitzer , Y. Lee , Q. Zhu , M. Kehata , M. Li , S. Jiang , M. Rastaldi , C. Cohen , O. Troyanskaya, M. Kretzler

Cell-lineage-specific transcripts are essential for differentiated tissue function, implicated in hereditary organ failure, and mediate acquired chronic diseases. However, experimental identification of cell-lineage-specific genes in a genome-scale manner is infeasible for most solid human tissues. We developed the first genome-scale method to identify genes with cell-lineage-specific expression, even in lineages not separable by experimental microdissection. Our machine-learning-based approach leverages high-throughput data from tissue homogenates in a novel iterative statistical framework. We applied this method to chronic kidney disease and identified transcripts specific to podocytes, key cells in the glomerular filter responsible for hereditary and most acquired glomerular kidney disease. In a systematic evaluation of our predictions by immunohistochemistry, our in silico approach was significantly more accurate (65% accuracy in human) than predictions based on direct measurement of in vivo fluorescence-tagged murine podocytes (23%). Our method identified genes implicated as causal in hereditary glomerular disease and involved in molecular pathways of acquired and chronic renal diseases. Furthermore, based on expression analysis of human kidney disease biopsies, we demonstrated that expression of the podocyte genes identified by our approach is significantly related to the degree of renal impairment in patients. Our approach is broadly applicable to define lineage specificity in both cell physiology and human disease contexts. We provide a user-friendly website that enables researchers to apply this method to any cell-lineage or tissue of interest. Identified cell-lineage-specific transcripts are expected to play essential tissue-specific roles in organogenesis and disease and can provide starting points for the development of organ-specific diagnostics and therapies.

Show Abstract

Ontology-aware classification of tissue and cell-type signals in gene expression profiles across platforms and technologies

Y.-S. Lee, A. Krishnan, Q. Zhu, O. Troyanskaya

We present Unveiling RNA Sample Annotation (URSA) that leverages the complex tissue/cell-type relationships and simultaneously estimates the probabilities associated with hundreds of tissues/cell-types for any given gene expression profile. URSA provides accurate and intuitive probability values for expression profiles across independent studies and outperforms other methods, irrespective of data preprocessing techniques. Moreover, without re-training, URSA can be used to classify samples from diverse microarray platforms and even from next-generation sequencing technology. Finally, we provide a molecular interpretation for the tissue and cell-type models as the biological basis for URSA’s classifications.

Show Abstract

A New System for Comparative Functional Genomics of Saccharomyces Yeasts

A. Caudy, Yuanfang Guan, Y. Jia, C. Hansen, C. DeSevo, A.P. Hayes, J. Agee, J.R. Alvarez-Dominguez, H. Arellano, D. Barrett, C. Bauerle, N. Bisaria, P. Bradley, J.S. Breunig, E. Bush, D. Cappel, E. Capra, W. Chen, J. Clore, P. Combs, C. Doucette, O. Demuren, P. Fellowes, S. Freeman, E. Frenkel, D. Gadala-Maria, R. Gawande, D. Glass, S. Grossberg, A. Gupta, L. Hammonds-Odie, A. Hoisos, Jenny Hsi, Y. Huang Hsu, S. Inukai, K. J. Karczewski, X. Ke, M. Kojima, S. Leachman, D. Lieber, A. Liebowitz, J. Liu, Y. Liu, T. Martin, J. Rosa Mendoza, C. Myhrvold, C. Millian, S. Pfau, S. Raj, M. Rich, J. Rokicki, W. Rounds, M. Salazar, M. Salesi, R. Sharma, S. Silverman, C. Singer, S. Sinha, M. Staller, P. Stern, H. Tang, S. Weeks, M. Weidmann, A. Wolf, C. Young, J. Yuan, C. Crutchfield, M. McClean, C. Murphy, M. Llinás, D. Botstein, O. Troyanskaya, M. Dunham

Whole-genome sequencing, particularly in fungi, has progressed at a tremendous rate. More difficult, however, is experimental testing of the inferences about gene function that can be drawn from comparative sequence analysis alone. We present a genome-wide functional characterization of a sequenced but experimentally understudied budding yeast, Saccharomyces bayanus var. uvarum (henceforth referred to as S. bayanus), allowing us to map changes over the 20 million years that separate this organism from S. cerevisiae. We first created a suite of genetic tools to facilitate work in S. bayanus. Next, we measured the gene-expression response of S. bayanus to a diverse set of perturbations optimized using a computational approach to cover a diverse array of functionally relevant biological responses. The resulting data set reveals that gene-expression patterns are largely conserved, but significant changes may exist in regulatory networks such as carbohydrate utilization and meiosis. In addition to regulatory changes, our approach identified gene functions that have diverged. The functions of genes in core pathways are highly conserved, but we observed many changes in which genes are involved in osmotic stress, peroxisome biogenesis, and autophagy. A surprising number of genes specific to S. bayanus respond to oxidative stress, suggesting the organism may have evolved under different selection pressures than S. cerevisiae. This work expands the scope of genome-scale evolutionary studies from sequence-based analysis to rapid experimental characterization and could be adopted for functional mapping in any lineage of interest. Furthermore, our detailed characterization of S. bayanus provides a valuable resource for comparative functional genomics studies in yeast.

Show Abstract