2573 Publications

Supercharged coiled-coil protein with N-terminal decahistidine tag boosts siRNA complexation and delivery efficiency of a lipoproteoplex

Jonathan W. Sun, Joseph S. Thomas, D. Renfrew, et al.

Short interfering RNA (siRNA) therapeutics have soared in popularity due to their highly selective and potent targeting of faulty genes, providing a non-palliative approach to address diseases. Despite their potential, effective transfection of siRNA into cells requires the assistance of an accompanying vector. Vectors constructed from non-viral materials, while offering safer and non-cytotoxic profiles, often grapple with lackluster loading and delivery efficiencies, necessitating substantial milligram quantities of expensive siRNA to confer the desired downstream effects. We detail the recombinant synthesis of a diverse series of coiled-coil supercharged protein (CSP) biomaterials systematically designed to investigate the impact of two arginine point mutations (Q39R and N61R) and decahistidine tags on liposomal siRNA delivery. The most efficacious variant, N8, exhibits a twofold increase in its affinity to siRNA and achieves a twofold enhancement in transfection activity with minimal cytotoxicity in vitro. Subsequent analysis unveils the destabilizing effect of the Q39R and N61R supercharging mutations and the incorporation of C-terminal decahistidine tags on α-helical secondary structure. Cross-correlational regression analyses reveal that the amount of helical character in these mutants is key in N8's enhanced siRNA complexation and downstream delivery efficiency.

Show Abstract

Quaia, the Gaia-unWISE Quasar Catalog: An All-sky Spectroscopic Quasar Sample

Kate Storey-Fisher, D. Hogg, Hans-Walter Rix, Anna-Christina Eilers, Giulio Fabbian, Michael R. Blanton, David Alonso

We present a new, all-sky quasar catalog, Quaia, that samples the largest comoving volume of any existing spectroscopic quasar sample. The catalog draws on the 6,649,162 quasar candidates identified by the Gaia mission that have redshift estimates from the space observatory's low-resolution blue photometer/red photometer spectra. This initial sample is highly homogeneous and complete, but has low purity, and 18% of even the bright (G < 20.0) confirmed quasars have discrepant redshift estimates (∣Δz/(1 + z)∣ > 0.2) compared to those from the Sloan Digital Sky Survey (SDSS). In this work, we combine the Gaia candidates with unWISE infrared data (based on the Wide-field Infrared Survey Explorer survey) to construct a catalog useful for cosmological and astrophysical quasar studies. We apply cuts based on proper motions and colors, reducing the number of contaminants by approximately four times. We improve the redshifts by training a k-Nearest Neighbor model on SDSS redshifts, and achieve estimates on the G < 20.0 sample with only 6% (10%) catastrophic errors with ∣Δz/(1 + z)∣ > 0.2 (0.1), a reduction of approximately three times (approximately two times) compared to the Gaia redshifts. The final catalog has 1,295,502 quasars with G < 20.5, and 755,850 candidates in an even cleaner G < 20.0 sample, with accompanying rigorous selection function models. We compare Quaia to existing quasar catalogs, showing that its large effective volume makes it a highly competitive sample for cosmological large-scale structure analyses. The catalog is publicly available at 10.5281/zenodo.10403370.

Show Abstract

Estimating Shape Distances on Neural Representations with Limited Samples

Dean A. Pospisil, B. Larsen, S. Harvey, A. Williams

Measuring geometric similarity between high-dimensional network representations is a topic of longstanding interest to neuroscience and deep learning. Although many methods have been proposed, only a few works have rigorously analyzed their statistical efficiency or quantified estimator uncertainty in data-limited regimes. Here, we derive upper and lower bounds on the worst-case convergence of standard estimators of shape distance—a measure of representational dissimilarity proposed by Williams et al. (2021). These bounds reveal the challenging nature of the problem in high-dimensional feature spaces. To overcome these challenges, we introduce a novel method-of-moments estimator with a tunable bias-variance tradeoff parameterized by an upper bound on bias. We show that this estimator achieves superior performance to standard estimators in simulation and on neural data, particularly in high-dimensional settings. Our theoretical work and estimator thus respectively define and dramatically expand the scope of neural data for which geometric similarity can be accurately measured.

Show Abstract

ERK inhibits Cic repressor function via multisite phosphorylation

Sayantanee Paul, Khandan Ilkhani, S. Shvartsman, et al.

The receptor tyrosine kinase (RTK)/Extracellular Signal-Regulated Kinase (ERK) signaling pathway controls cell proliferation, differentiation, and survival. How ERK activation is relayed to its phosphorylation targets is not well understood. The transcriptional repressor Capicua (Cic) has emerged as a key target for ERK-mediated downregulation in Drosophila and mammals, and mutations in human CIC result in cancer and neurological diseases. Phosphorylation by ERK is critical for Cic downregulation, but the identities of phosphosites in Drosophila Cic are unknown. Here, we identify sites of phosphorylation in Cic that are directly targeted by ERK and validate their developmental functions in vivo using mutant Cic variants. Cic phosphosites are distributed throughout the length of the protein, and a group of centrally located sites appears to have a primary role in Cic downregulation. Cic mutated in 20 high-confidence sites behaves as a “super-repressor” in vivo that is largely insensitive to ERK-mediated downregulation, despite fully retaining the ability to bind to ERK. No single site is sufficient to turn off Cic activity; instead, we find that ERK must phosphorylate multiple sites in Cic simultaneously to achieve full downregulation. This multisite phosphorylation likely targets phosphodegrons that are recognized by ubiquitin ligases such as Ago/FBXW7 and contributes to Cic degradation. This study advances our understanding of the molecular mechanisms of signal interpretation downstream of the RTK/ERK signaling network.

Show Abstract
March 14, 2024

Combining machine learning with structure-based protein design to predict and engineer post-translational modifications of proteins

Moritz Ertelt, V. Mulligan, et al.

Post-translational modifications (PTMs) of proteins play a vital role in their function and stability. These modifications influence protein folding, signaling, protein-protein interactions, enzyme activity, binding affinity, aggregation, degradation, and much more. To date, over 400 types of PTMs have been described, representing chemical diversity well beyond the genetically encoded amino acids. Such modifications pose a challenge to the successful design of proteins, but also represent a major opportunity to diversify the protein engineering toolbox. To this end, we first trained artificial neural networks (ANNs) to predict eighteen of the most abundant PTMs, including protein glycosylation, phosphorylation, methylation, and deamidation. In a second step, these models were implemented inside the computational protein modeling suite Rosetta, which allows flexible combination with existing protocols to model the modified sites and understand their impact on protein stability as well as function. Lastly, we developed a new design protocol that either maximizes or minimizes the predicted probability of a particular site being modified. We find that this combination of ANN prediction and structure-based design can enable the modification of existing, as well as the introduction of novel, PTMs. The potential applications of our work include, but are not limited to, glycan masking of epitopes, strengthening protein-protein interactions through phosphorylation, as well as protecting proteins from deamidation liabilities. These applications are especially important for the design of new protein therapeutics where PTMs can drastically change the therapeutic properties of a protein. Our work adds novel tools to Rosetta’s protein engineering toolbox that allow for the rational design of PTMs.

Show Abstract

Ensemble Detection of DNA Engineering Signatures

Aaron Adler, Joel S. Bader, A. Persikov

Synthetic biology is creating genetically engineered organisms at an increasing rate for many potentially valuable applications, but this potential comes with the risk of misuse or accidental release. To begin to address this issue, we have developed a system called GUARDIAN that can automatically detect signatures of engineering in DNA sequencing data, and we have conducted a blinded test of this system using a curated Test and Evaluation (T&E) data set. GUARDIAN uses an ensemble approach based on the guiding principle that no single approach is likely to be able to detect engineering with perfect accuracy. Critically, ensembling enables GUARDIAN to detect sequence inserts in 13 target organisms with a high degree of specificity that requires no subject matter expert (SME) review.

Show Abstract

A cell autonomous regulator of neuronal excitability modulates tau in Alzheimer’s disease vulnerable neurons

Patricia Rodriguez-Rodriguez, Luis Enrique Arroyo-Garcia, O. Troyanskaya, et al.

Neurons from layer II of the entorhinal cortex (ECII) are the first to accumulate tau protein aggregates and degenerate during prodromal Alzheimer’s disease. Gaining insight into the molecular mechanisms underlying this vulnerability will help reveal genes and pathways at play during incipient stages of the disease. Here, we use a data-driven functional genomics approach to model ECII neurons in silico and identify the proto-oncogene DEK as a regulator of tau pathology.

We show that epigenetic changes caused by Dek silencing alter activity-induced transcription, with major effects on neuronal excitability. This is accompanied by the gradual accumulation of tau in the somatodendritic compartment of mouse ECII neurons in vivo, reactivity of surrounding microglia, and microglia-mediated neuron loss. These features are all characteristic of early Alzheimer’s disease.

The existence of a cell-autonomous mechanism linking Alzheimer’s disease pathogenic mechanisms in the precise neuron type where the disease starts provides unique evidence that synaptic homeostasis dysregulation is of central importance in the onset of tau pathology in Alzheimer’s disease.

Show Abstract

Adsorption and vibrational spectroscopy of CO on the surface of MgO from periodic local coupled-cluster theory

Hong-Zhou Ye, T. Berkelbach

The adsorption of CO on the surface of MgO has long been a model problem in surface chemistry. Here, we report periodic Gaussian-based calculations for this problem using second-order perturbation theory (MP2) and coupled-cluster theory with single and double excitations (CCSD) and perturbative triple excitations [CCSD(T)], with the latter two performed using a recently developed extension of the local natural orbital approximation to problems with periodic boundary conditions. The low cost of periodic local correlation calculations allows us to calculate the full CCSD(T) binding curve of CO approaching the surface of MgO (and thus the adsorption energy) and the two-dimensional potential energy surface (PES) as a function of the distance from the surface and the CO stretching coordinate. From the PES, we obtain the fundamental vibrational frequency of CO on MgO, whose shift from the gas phase value is a common experimental probe of surface adsorption. We find that CCSD(T) correctly predicts a positive frequency shift upon adsorption of $+14.7~\textrm{cm}^{-1}$, in excellent agreement with the experimental shift of $+14.3~\textrm{cm}^{-1}$. We use our CCSD(T) results to assess the accuracy of MP2, CCSD, and several density functional theory (DFT) approximations, including exchange correlation functionals and dispersion corrections. We find that MP2 and CCSD yield reasonable binding energies and frequency shifts, whereas many DFT calculations overestimate the magnitude of the adsorption energy by $5$ -- $15$~kJ/mol and predict a negative frequency shift of about $-20~\textrm{cm}^{-1}$, which we attribute to self-interaction-induced delocalization errors that are mildly ameliorated with hybrid functionals. Our findings highlight the accuracy and computational efficiency of the periodic local correlation for the simulation of surface chemistry with accurate wavefunction methods.

Show Abstract

Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations

Rylan Schaeffer, Berivan Isik, Dhruv Bhandarkar Pai, Andres Carranza, Victor Lecomte, Alyssa Unell, Mikail Khona, T. Yerxa, Y. LeCun, S. Chung , Andrey Gromov, Ravid Shwartz-Ziv, Sanmi Koyejo

Maximum Manifold Capacity Representations (MMCR) is a recent multi-view self-supervised learning (MVSSL) method that matches or surpasses other leading MVSSL methods. MMCR is intriguing because it does not fit neatly into any of the commonplace MVSSL lineages, instead originating from a statistical mechanical perspective on the linear separability of data manifolds. In this paper, we seek to improve our understanding and our utilization of MMCR. To better understand MMCR, we leverage tools from high dimensional probability to demonstrate that MMCR incentivizes alignment and uniformity of learned embeddings. We then leverage tools from information theory to show that such embeddings maximize a well-known lower bound on mutual information between views, thereby connecting the geometric perspective of MMCR to the information-theoretic perspective commonly discussed in MVSSL. To better utilize MMCR, we mathematically predict and experimentally confirm non-monotonic changes in the pretraining loss akin to double descent but with respect to atypical hyperparameters. We also discover compute scaling laws that enable predicting the pretraining loss as a function of gradients steps, batch size, embedding dimension and number of views. We then show that MMCR, originally applied to image data, is performant on multimodal image-text data. By more deeply understanding the theoretical and empirical behavior of MMCR, our work reveals insights on improving MVSSL methods.

Show Abstract

On the universality of neural encodings in CNNs

F. Guth, Brice Ménard

We explore the universality of neural encodings in convolutional neural networks trained on image classification tasks. We develop a procedure to directly compare the learned weights rather than their representations. It is based on a factorization of spatial and channel dimensions and measures the similarity of aligned weight covariances. We show that, for a range of layers of VGG-type networks, the learned eigenvectors appear to be universal across different natural image datasets. Our results suggest the existence of a universal neural encoding for natural images. They explain, at a more fundamental level, the success of transfer learning. Our work shows that, instead of aiming at maximizing the performance of neural networks, one can alternatively attempt to maximize the universality of the learned encoding, in order to build a principled foundation model.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates

privacy consent banner

Privacy preference

We use cookies to provide you with the best online experience. By clicking "Accept All," you help us understand how our site is used and enhance its performance. You can change your choice at any time here. To learn more, please visit our Privacy Policy.