CCB: Publications

Combining machine learning with structure-based protein design to predict and engineer post-translational modifications of proteins

Moritz Ertelt, V. Mulligan, et al.

Post-translational modifications (PTMs) of proteins play a vital role in their function and stability. These modifications influence protein folding, signaling, protein-protein interactions, enzyme activity, binding affinity, aggregation, degradation, and much more. To date, over 400 types of PTMs have been described, representing chemical diversity well beyond the genetically encoded amino acids. Such modifications pose a challenge to the successful design of proteins, but also represent a major opportunity to diversify the protein engineering toolbox. To this end, we first trained artificial neural networks (ANNs) to predict eighteen of the most abundant PTMs, including protein glycosylation, phosphorylation, methylation, and deamidation. In a second step, these models were implemented inside the computational protein modeling suite Rosetta, which allows flexible combination with existing protocols to model the modified sites and understand their impact on protein stability as well as function. Lastly, we developed a new design protocol that either maximizes or minimizes the predicted probability of a particular site being modified. We find that this combination of ANN prediction and structure-based design can enable the modification of existing, as well as the introduction of novel, PTMs. The potential applications of our work include, but are not limited to, glycan masking of epitopes, strengthening protein-protein interactions through phosphorylation, as well as protecting proteins from deamidation liabilities. These applications are especially important for the design of new protein therapeutics where PTMs can drastically change the therapeutic properties of a protein. Our work adds novel tools to Rosetta’s protein engineering toolbox that allow for the rational design of PTMs.

Show Abstract

Protein-Engineered Fibers For Drug Encapsulation Traceable via 19F Magnetic Resonance

Dustin Britton, Jakub Legocki, D. Renfrew, et al.

Theranostic materials research is experiencing rapid growth driven by the interest in integrating both therapeutic and diagnostic modalities. These materials offer the unique capability to not only provide treatment but also track the progression of a disease. However, to create an ideal theranostic biomaterial without compromising drug encapsulation, diagnostic imaging must be optimized for improved sensitivity and spatial localization. Herein, we create a protein-engineered fluorinated coiled-coil fiber, Q2TFL, capable of improved sensitivity to 19F magnetic resonance spectroscopy (MRS) detection. Leveraging residue-specific noncanonical amino acid incorporation of trifluoroleucine (TFL) into the coiled-coil, Q2, which self-assembles into nanofibers, we generate Q2TFL. We demonstrate that fluorination results in a greater increase in thermostability and 19F magnetic resonance detection compared to the nonfluorinated parent, Q2. Q2TFL also exhibits linear ratiometric 19F MRS thermoresponsiveness, allowing it to act as a temperature probe. Furthermore, we explore the ability of Q2TFL to encapsulate the anti-inflammatory small molecule, curcumin (CCM), and its impact on the coiled-coil structure. Q2TFL also provides hyposignal contrast in 1H MRI, echogenic signal with high-frequency ultrasound and sensitive detection by 19F MRS in vivo illustrating fluorination of coiled-coils for supramolecular assembly and their use with 1H MRI, 19F MRS and high frequency ultrasound as multimodal theranostic agents.

Show Abstract

Scaffold Matcher: A CMA-ES based algorithm for identifying hotspot aligned peptidomimetic scaffolds

Erin R. Claussen, D. Renfrew, Christian L. Müller, Kevin Drew

The design of protein interaction inhibitors is a promising approach to address aberrant protein interactions that cause disease. One strategy in designing inhibitors is to use peptidomimetic scaffolds that mimic the natural interaction interface. A central challenge in using peptidomimetics as protein interaction inhibitors, however, is determining how best the molecular scaffold aligns to the residues of the interface it is attempting to mimic. Here we present the Scaffold Matcher algorithm that aligns a given molecular scaffold onto hotspot residues from a protein interaction interface. To optimize the degrees of freedom of the molecular scaffold we implement the covariance matrix adaptation evolution strategy (CMA-ES), a state-of-the-art derivative-free optimization algorithm in Rosetta. To evaluate the performance of the CMA-ES, we used 26 peptides from the FlexPepDock Benchmark and compared with three other algorithms in Rosetta, specifically, Rosetta's default minimizer, a Monte Carlo protocol of small backbone perturbations, and a Genetic algorithm. We test the algorithms' performance on their ability to align a molecular scaffold to a series of hotspot residues (i.e., constraints) along native peptides. Of the 4 methods, CMA-ES was able to find the lowest energy conformation for all 26 benchmark peptides. Additionally, as a proof of concept, we apply the Scaffold Match algorithm with CMA-ES to align a peptidomimetic oligooxopiperazine scaffold to the hotspot residues of the substrate of the main protease of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Our implementation of CMA-ES into Rosetta allows for an alternative optimization method to be used on macromolecular modeling problems with rough energy landscapes. Finally, our Scaffold Matcher algorithm allows for the identification of initial conformations of interaction inhibitors that can be further designed and optimized as high-affinity reagents.

Show Abstract

Sequence-structure-function relationships in the microbial protein universe

J. Koehler, Pawel Szczerbiak, D. Renfrew, A. Pataki, N. Carriero, I. Fisk, et al.

For the past half-century, structural biologists relied on the notion that similar protein sequences give rise to similar structures and functions. While this assumption has driven research to explore certain parts of the protein universe, it disregards spaces that don’t rely on this assumption. Here we explore areas of the protein universe where similar protein functions can be achieved by different sequences and different structures. We predict ~200,000 structures for diverse protein sequences from 1,003 representative genomes across the microbial tree of life and annotate them functionally on a per-residue basis. Structure prediction is accomplished using the World Community Grid, a large-scale citizen science initiative. The resulting database of structural models is complementary to the AlphaFold database, with regards to domains of life as well as sequence diversity and sequence length. We identify 148 novel folds and describe examples where we map specific functions to structural motifs. We also show that the structural space is continuous and largely saturated, highlighting the need for a shift in focus across all branches of biology, from obtaining structures to putting them into context and from sequence-based to sequence-structure-function based meta-omics analyses.

Show Abstract

Docking cholesterol to integral membrane proteins with Rosetta

Brennica Marlow, Georg Kuenze, Jens Meiler, J. Koehler

Lipid molecules such as cholesterol interact with the surface of integral membrane proteins (IMP) in a mode different from drug-like molecules in a protein binding pocket. These differences are due to the lipid molecule’s shape, the membrane’s hydrophobic environment, and the lipid’s orientation in the membrane. We can use the recent increase in experimental structures in complex with cholesterol to understand protein-cholesterol interactions. We developed the RosettaCholesterol protocol consisting of (1) a prediction phase using an energy grid to sample and score native-like binding poses and (2) a specificity filter to calculate the likelihood that a cholesterol interaction site may be specific. We used a multi-pronged benchmark (self-dock, flip-dock, cross-dock, and global-dock) of protein-cholesterol complexes to validate our method. RosettaCholesterol improved sampling and scoring of native poses over the standard RosettaLigand baseline method in 91% of cases and performs better regardless of benchmark complexity. On the β2AR, our method found one likely-specific site, which is described in the literature. The RosettaCholesterol protocol quantifies cholesterol binding site specificity. Our approach provides a starting point for high-throughput modeling and prediction of cholesterol binding sites for further experimental validation.

Show Abstract

Tuning a coiled-coil hydrogel via computational design of supramolecular fiber assembly

D. Britton, M. Meleties, D. Renfrew, et al.

The previously reported Q is a thermoresponsive coiled-coil protein capable of higher-order supramolecular assembly into fibers and hydrogels with upper critical solution temperature (UCST) behavior. Here, we introduce a new coiled-coil protein that is redesigned to disfavor lateral growth of its fibers and thus achieve a higher crosslinking density within the formed hydrogel. We also introduce a favorable hydrophobic mutation to the pore of the coiled-coil domain for increased thermostability of the protein. We note that an increase in storage modulus of the hydrogel and crosslinking density is coupled with a decrease in fiber diameter. We further fully characterize our α-helical coiled-coil (Q2) hydrogel for its structure, nano-assembly, and rheology relative to our previous single domain protein, Q, over the time of its gelation demonstrating the nature of our hydrogel self-assembly system. In this vein, we also characterize the ability of Q2 to encapsulate the small hydrophobic small molecule, curcumin, and its impact on the mechanical properties of Q2. The design parameters here not only show the importance of electrostatic potential in self-assembly but also provide a step towards predictable design of electrostatic protein interactions.

Show Abstract

Fluorescent azobenzene-confined coiled-coil mesofibers

Kamia Punia, D. Renfrew

Fluorescent protein biomaterials have important applications such as bioimaging in pharmacological studies. Self-assembly of proteins, especially into fibrils, is known to produce fluorescence in the blue band. Capable of self-assembly into nanofibers, we have shown we can modulate its aggregation into mesofibers by encapsulation of a small hydrophobic molecule. Conversely, azobenzenes are hydrophobic small molecules that are virtually non-fluorescent in solution due to their highly efficient photoisomerization. However, they demonstrate fluorogenic properties upon confinement in nanoscale assemblies by reducing the non-radiative photoisomerization. Here, we report the fluorescence of a hybrid protein-small molecule system in which azobenzene is confined in our protein assembly leading to fiber thickening and increased fluorescence. We show our engineered protein Q encapsulates AzoCholine, bearing a photoswitchable azobenzene moiety, in the hydrophobic pore to produce fluorescent mesofibers. This study further investigates the photocontrol of protein conformation as well as fluorescence of an azobenze-containing biomaterial.

Show Abstract

How to Design Peptides

Joseph Dodd-O, Amanda M Acevedo-Jake, V. Mulligan, et al.

Novel design of proteins to target receptors for treatment or tissue augmentation has come to the fore owing to advancements in computing power, modeling frameworks, and translational successes. Shorter proteins, or peptides, can offer combinatorial synergies with dendrimer, polymer, or other peptide carriers for enhanced local signaling, which larger proteins may sterically hinder. Here, we present a generalized method for designing a novel peptide. We first show how to create a script protocol that can be used to iteratively optimize and screen novel peptide sequences for binding a target protein. We present a step-by-step introduction to utilizing file repositories, data bases, and the Rosetta software suite. RosettaScripts, an .xml interface that allows for sequential functions to be performed, is used to order the functions for repeatable performance. These strategies may lead to more groups venturing into computational design, which may result in synergies from artificial intelligence/machine learning (AI/ML) to phage display and screening. Importantly, the beginner is expected to be able to design their first peptide ligand and begin their journey in peptide drug discovery. Generally, these peptides potentially could be used to interact with any enzyme or receptor, for example, in the study of chemokines and their interactions with glycosoaminoglycans and their receptors.

Show Abstract

Multibody molecular docking on a quantum annealer

Mohit Pandey, Tristan Zaborniak, V. Mulligan, et al

Molecular docking, which aims to find the most stable interacting configuration of a set of molecules, is of critical importance to drug discovery. Although a considerable number of classical algorithms have been developed to carry out molecular docking, most focus on the limiting case of docking two molecules. Since the number of possible configurations of N molecules is exponential in N, those exceptions which permit docking of more than two molecules scale poorly, requiring exponential resources to find high-quality solutions. Here, we introduce a one-hot encoded quadratic unconstrained binary optimization formulation (QUBO) of the multibody molecular docking problem, which is suitable for solution by quantum annealer. Our approach involves a classical pre-computation of pairwise interactions, which scales only quadratically in the number of bodies while permitting well-vetted scoring functions like the Rosetta REF2015 energy function to be used. In a second step, we use the quantum annealer to sample low-energy docked configurations efficiently, considering all possible docked configurations simultaneously through quantum superposition. We show that we are able to minimize the time needed to find diverse low-energy docked configurations by tuning the strength of the penalty used to enforce the one-hot encoding, demonstrating a 3-4 fold improvement in solution quality and diversity over performance achieved with conventional penalty strengths. By mapping the configurational search to a form compatible with current- and future-generation quantum annealers, this work provides an alternative means of solving multibody docking problems that may prove to have performance advantages for large problems, potentially circumventing the exponential scaling of classical approaches and permitting a much more efficient solution to a problem central to drug discovery and validation pipelines.

Show Abstract

Accurate de novo design of membrane-traversing macrocycles

G. Bhardwaj, G. Bhardwaj, J. O’Connor, V. Mulligan, et al.

We use computational design coupled with experimental characterization to systematically investigate the design principles for macrocycle membrane permeability and oral bioavailability. We designed 184 6–12 residue macrocycles with a wide range of predicted structures containing noncanonical backbone modifications and experimentally determined structures of 35; 29 are very close to the computational models. With such control, we show that membrane permeability can be systematically achieved by ensuring all amide (NH) groups are engaged in internal hydrogen bonding interactions. 84 designs over the 6–12 residue size range cross membranes with an apparent permeability greater than 1 × 10−6 cm/s. Designs with exposed NH groups can be made membrane permeable through the design of an alternative isoenergetic fully hydrogen-bonded state favored in the lipid membrane. The ability to robustly design membrane-permeable and orally bioavailable peptides with high structural accuracy should contribute to the next generation of designed macrocycle therapeutic

Show Abstract