698 Publications

Engineered coiled-coil protein microfibers

J. Hume, J. Sun, R. Jacquet, D. Renfrew, J. Martin, R. Bonneau, M.L. Gilchrist, J.K. Montclare

The fabrication of de novo proteins able to self-assemble on the nano- to meso-length scales is critical in the development of protein-based biomaterials in nanotechnology and medicine. Here we report the design and characterization of a protein engineered coiled-coil that not only assembles into microfibers, but also can bind hydrophobic small molecules. Under ambient conditions, the protein forms fibers with nanoscale structure possessing large aspect ratios formed by bundles of α-helical homopentameric assemblies, which further assemble into mesoscale fibers in the presence of curcumin through aggregation. Surprisingly, these biosynthesized fibers are able to form in conditions of remarkably low concentrations. Unlike previously designed coiled-coil fibers, these engineered protein microfibers can bind the small molecule curcumin throughout the assembly, serving as a depot for encapsulation and delivery of other chemical agents within protein-based 3D microenvironments.

Show Abstract

Negative Example Selection for Protein Function Prediction: The NoGO Database

N. Youngs, D. Penfold-Brown, R. Bonneau, D. Shasha

Negative examples – genes that are known not to carry out a given protein function – are rarely recorded in genome and proteome annotation databases, such as the Gene Ontology database. Negative examples are required, however, for several of the most powerful machine learning methods for integrative protein function prediction. Most protein function prediction efforts have relied on a variety of heuristics for the choice of negative examples. Determining the accuracy of methods for negative example prediction is itself a non-trivial task, given that the Open World Assumption as applied to gene annotations rules out many traditional validation metrics. We present a rigorous comparison of these heuristics, utilizing a temporal holdout, and a novel evaluation strategy for negative examples. We add to this comparison several algorithms adapted from Positive-Unlabeled learning scenarios in text-classification, which are the current state of the art methods for generating negative examples in low-density annotation contexts. Lastly, we present two novel algorithms of our own construction, one based on empirical conditional probability, and the other using topic modeling applied to genes and annotations. We demonstrate that our algorithms achieve significantly fewer incorrect negative example predictions than the current state of the art, using multiple benchmarks covering multiple organisms. Our methods may be applied to generate negative examples for any type of method that deals with protein function, and to this end we provide a database of negative examples in several well-studied organisms, for general use (The NoGO database, available at: bonneaulab.bio.nyu.edu/nogo.html).

Show Abstract

Rational Design of Topographical Helix Mimics as Potent Inhibitors of Protein–Protein Interactions

B.B. Lao, K. Drew, D.A. Guarracino, T.F. Brewer, D.W. Heindel, R. Bonneau, P.S. Arora

Protein–protein interactions encompass large surface areas, but often a handful of key residues dominate the binding energy landscape. Rationally designed small molecule scaffolds that reproduce the relative positioning and disposition of important binding residues, termed “hotspot residues”, have been shown to successfully inhibit specific protein complexes. Although this strategy has led to development of novel synthetic inhibitors of protein complexes, often direct mimicry of natural amino acid residues does not lead to potent inhibitors. Experimental screening of focused compound libraries is used to further optimize inhibitors but the number of possible designs that can be efficiently synthesized and experimentally tested in academic settings is limited. We have applied the principles of computational protein design to optimization of nonpeptidic helix mimics as ligands for protein complexes. We describe the development of computational tools to design helix mimetics from canonical and noncanonical residue libraries and their application to two therapeutically important protein–protein interactions: p53-MDM2 and p300-HIF1α. The overall study provides a streamlined approach for discovering potent peptidomimetic inhibitors of protein–protein interactions.

Show Abstract

Helminth Colonization Is Associated with Increased Diversity of the Gut Microbiota

S.C. Lee, M.S. Tang, Y.A.L. Lim, S.H. Choy, Z.D. Kurtz, L.M. Cox, U.M. Gundra, I. Cho, R. Bonneau, M.J. Blaser, K.H. Chua, P. Loke

Soil-transmitted helminths colonize more than 1.5 billion people worldwide, yet little is known about how they interact with bacterial communities in the gut microbiota. Differences in the gut microbiota between individuals living in developed and developing countries may be partly due to the presence of helminths, since they predominantly infect individuals from developing countries, such as the indigenous communities in Malaysia we examine in this work. We compared the composition and diversity of bacterial communities from the fecal microbiota of 51 people from two villages in Malaysia, of which 36 (70.6%) were infected by helminths. The 16S rRNA V4 region was sequenced at an average of nineteen thousand sequences per samples. Helminth-colonized individuals had greater species richness and number of observed OTUs with enrichment of Paraprevotellaceae, especially with Trichuris infection. We developed a new approach of combining centered log-ratio (clr) transformation for OTU relative abundances with sparse Partial Least Squares Discriminant Analysis (sPLS-DA) to enable more robust predictions of OTU interrelationships. These results suggest that helminths may have an impact on the diversity, bacterial community structure and function of the gut microbiota.

Show Abstract

A Rotamer Library to Enable Modeling and Design of Peptoid Foldamers

D. Renfrew, T.W. Craven, G.L. Butterfoss, K. Kirshenbaum, R. Bonneau

Peptoids are a family of synthetic oligomers composed of N-substituted glycine units. Along with other “foldamer” systems, peptoid oligomer sequences can be predictably designed to form a variety of stable secondary structures. It is not yet evident if foldamer design can be extended to reliably create tertiary structure features that mimic more complex biomolecular folds and functions. Computational modeling and prediction of peptoid conformations will likely play a critical role in enabling complex biomimetic designs. We introduce a computational approach to provide accurate conformational and energetic parameters for peptoid side chains needed for successful modeling and design. We find that peptoids can be described by a “rotamer” treatment, similar to that established for proteins, in which the peptoid side chains display rotational isomerism to populate discrete regions of the conformational landscape. Because of the insufficient number of solved peptoid structures, we have calculated the relative energies of side-chain conformational states to provide a backbone-dependent (BBD) rotamer library for a set of 54 different peptoid side chains. We evaluated two rotamer library development methods that employ quantum mechanics (QM) and/or molecular mechanics (MM) energy calculations to identify side-chain rotamers. We show by comparison to experimental peptoid structures that both methods provide an accurate prediction of peptoid side chain placements in folded peptoid oligomers and at protein interfaces. We have incorporated our peptoid rotamer libraries into ROSETTA, a molecular design package previously validated in the context of protein design and structure prediction.

Show Abstract

Collier/OLF/EBF-Dependent Transcriptional Dynamics Control Pharyngeal Muscle Specification from Primed Cardiopharyngeal Progenitors

F. Razy-Krajka, K. Lam, W. Wang, A. Stolfi, M. Joly, R. Bonneau, L. Christiaen

In vertebrates, pluripotent pharyngeal mesoderm progenitors produce the cardiac precursors of the second heart field as well as the branchiomeric head muscles and associated stem cells. However, the mechanisms underlying the transition from multipotent progenitors to distinct muscle precursors remain obscured by the complexity of vertebrate embryos. Using Ciona intestinalis as a simple chordate model, we show that bipotent cardiopharyngeal progenitors are primed to activate both heart and pharyngeal muscle transcriptional programs, which progressively become restricted to corresponding precursors. The transcription factor COE (Collier/OLF/EBF) orchestrates the transition to pharyngeal muscle fate both by promoting an MRF-associated myogenic program in myoblasts and by maintaining an undifferentiated state in their sister cells through Notch-mediated lateral inhibition. The latter are stem cell-like muscle precursors that form most of the juvenile pharyngeal muscles. We discuss the implications of our findings for the development and evolution of the chordate cardiopharyngeal mesoderm.

Show Abstract

Bacillus subtilis systems biology

A.R. Bate, R. Bonneau, P. Eichenberger

Endospore-forming bacteria, with Bacillus subtilis being the prevalent model organism, belong to the phylum Firmicutes. Although the last common ancestor of all Firmicutes is likely to have been an endospore-forming species, not every lineage in the phylum has maintained the ability to produce endospores (hereafter, spores). In 1997, the release of the full genome sequence for B. subtilis strain 168 marked the beginning of the genomic era for the study of spore formation (sporulation). In this original genome sequence, 139 of the 4,100 protein-coding genes were annotated as sporulation genes. By the time a revised genome sequence with updated annotations was published in 2009, that number had increased significantly, especially since transcriptional profiling studies (transcriptomics) led to the identification of several genes expressed under the control of known sporulation transcription factors. Over the past decade, genome sequences for multiple spore-forming species have been released (including several strains in the Bacillus anthracis/Bacillus cereus group and many Clostridium species), and phylogenomic analyses have revealed many conserved sporulation genes. Parallel advances in transcriptomics led to the identification of small untranslated regulatory RNAs (sRNAs), including some that are expressed during sporulation. An extended array of -omics techniques, i.e., techniques designed to probe gene function on a genome-wide scale, such as proteomics, metabolomics, and high-throughput protein localization studies, have been implemented in microbiology. Combined with the use of new computational methods for predicting gene function and inferring regulatory relationships on a global scale, these -omics approaches are uncovering novel information about sporulation and a variety of other bacterial cell processes.

Show Abstract

Global Quantitative Modeling of Chromatin Factor Interactions

Chromatin is the driver of gene regulation, yet understanding the molecular interactions underlying chromatin factor combinatorial patterns (or the “chromatin codes”) remains a fundamental challenge in chromatin biology. Here we developed a global modeling framework that leverages chromatin profiling data to produce a systems-level view of the macromolecular complex of chromatin. Our model ultilizes maximum entropy modeling with regularization-based structure learning to statistically dissect dependencies between chromatin factors and produce an accurate probability distribution of chromatin code. Our unsupervised quantitative model, trained on genome-wide chromatin profiles of 73 histone marks and chromatin proteins from modENCODE, enabled making various data-driven inferences about chromatin profiles and interactions. We provided a highly accurate predictor of chromatin factor pairwise interactions validated by known experimental evidence, and for the first time enabled higher-order interaction prediction. Our predictions can thus help guide future experimental studies. The model can also serve as an inference engine for predicting unknown chromatin profiles — we demonstrated that with this approach we can leverage data from well-characterized cell types to help understand less-studied cell type or conditions.

Show Abstract

FIREWACh: high-throughput functional detection of transcriptional regulatory modules in mammalian cells

M. Murtha, Z. Tokcaer-Keskin, Z. Tang, F. Strino, X. Chen, Y. Wang, X. Xi, C. Basilico, S. Brown, R. Bonneau, Y. Kluger, L. Dailey

Promoters and enhancers establish precise gene transcription patterns. The development of functional approaches for their identification in mammalian cells has been complicated by the size of these genomes. Here we report a high-throughput functional assay for directly identifying active promoter and enhancer elements called FIREWACh (Functional Identification of Regulatory Elements Within Accessible Chromatin), which we used to simultaneously assess over 80,000 DNA fragments derived from nucleosome-free regions within the chromatin of embryonic stem cells (ESCs) and identify 6,364 active regulatory elements. Many of these represent newly discovered ESC-specific enhancers, showing enriched binding-site motifs for ESC-specific transcription factors including SOX2, POU5F1 (OCT4) and KLF4. The application of FIREWACh to additional cultured cell types will facilitate functional annotation of the genome and expand our view of transcriptional network dynamics.

Show Abstract
March 23, 2014
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates