Publications

Ensemble Detection of DNA Engineering Signatures

Aaron Adler, Joel S. Bader, A. Persikov

Synthetic biology is creating genetically engineered organisms at an increasing rate for many potentially valuable applications, but this potential comes with the risk of misuse or accidental release. To begin to address this issue, we have developed a system called GUARDIAN that can automatically detect signatures of engineering in DNA sequencing data, and we have conducted a blinded test of this system using a curated Test and Evaluation (T&E) data set. GUARDIAN uses an ensemble approach based on the guiding principle that no single approach is likely to be able to detect engineering with perfect accuracy. Critically, ensembling enables GUARDIAN to detect sequence inserts in 13 target organisms with a high degree of specificity that requires no subject matter expert (SME) review.

Show Abstract

A cell autonomous regulator of neuronal excitability modulates tau in Alzheimer’s disease vulnerable neurons

Patricia Rodriguez-Rodriguez, Luis Enrique Arroyo-Garcia, O. Troyanskaya, et al.

Neurons from layer II of the entorhinal cortex (ECII) are the first to accumulate tau protein aggregates and degenerate during prodromal Alzheimer’s disease. Gaining insight into the molecular mechanisms underlying this vulnerability will help reveal genes and pathways at play during incipient stages of the disease. Here, we use a data-driven functional genomics approach to model ECII neurons in silico and identify the proto-oncogene DEK as a regulator of tau pathology.

We show that epigenetic changes caused by Dek silencing alter activity-induced transcription, with major effects on neuronal excitability. This is accompanied by the gradual accumulation of tau in the somatodendritic compartment of mouse ECII neurons in vivo, reactivity of surrounding microglia, and microglia-mediated neuron loss. These features are all characteristic of early Alzheimer’s disease.

The existence of a cell-autonomous mechanism linking Alzheimer’s disease pathogenic mechanisms in the precise neuron type where the disease starts provides unique evidence that synaptic homeostasis dysregulation is of central importance in the onset of tau pathology in Alzheimer’s disease.

Show Abstract

Adsorption and vibrational spectroscopy of CO on the surface of MgO from periodic local coupled-cluster theory

Hong-Zhou Ye, T. Berkelbach

The adsorption of CO on the surface of MgO has long been a model problem in surface chemistry. Here, we report periodic Gaussian-based calculations for this problem using second-order perturbation theory (MP2) and coupled-cluster theory with single and double excitations (CCSD) and perturbative triple excitations [CCSD(T)], with the latter two performed using a recently developed extension of the local natural orbital approximation to problems with periodic boundary conditions. The low cost of periodic local correlation calculations allows us to calculate the full CCSD(T) binding curve of CO approaching the surface of MgO (and thus the adsorption energy) and the two-dimensional potential energy surface (PES) as a function of the distance from the surface and the CO stretching coordinate. From the PES, we obtain the fundamental vibrational frequency of CO on MgO, whose shift from the gas phase value is a common experimental probe of surface adsorption. We find that CCSD(T) correctly predicts a positive frequency shift upon adsorption of $+14.7~\textrm{cm}^{-1}$, in excellent agreement with the experimental shift of $+14.3~\textrm{cm}^{-1}$. We use our CCSD(T) results to assess the accuracy of MP2, CCSD, and several density functional theory (DFT) approximations, including exchange correlation functionals and dispersion corrections. We find that MP2 and CCSD yield reasonable binding energies and frequency shifts, whereas many DFT calculations overestimate the magnitude of the adsorption energy by $5$ -- $15$~kJ/mol and predict a negative frequency shift of about $-20~\textrm{cm}^{-1}$, which we attribute to self-interaction-induced delocalization errors that are mildly ameliorated with hybrid functionals. Our findings highlight the accuracy and computational efficiency of the periodic local correlation for the simulation of surface chemistry with accurate wavefunction methods.

Show Abstract

Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations

Rylan Schaeffer, Berivan Isik, Dhruv Bhandarkar Pai, Andres Carranza, Victor Lecomte, Alyssa Unell, Mikail Khona, T. Yerxa, Y. LeCun, S. Chung , Andrey Gromov, Ravid Shwartz-Ziv, Sanmi Koyejo

Maximum Manifold Capacity Representations (MMCR) is a recent multi-view self-supervised learning (MVSSL) method that matches or surpasses other leading MVSSL methods. MMCR is intriguing because it does not fit neatly into any of the commonplace MVSSL lineages, instead originating from a statistical mechanical perspective on the linear separability of data manifolds. In this paper, we seek to improve our understanding and our utilization of MMCR. To better understand MMCR, we leverage tools from high dimensional probability to demonstrate that MMCR incentivizes alignment and uniformity of learned embeddings. We then leverage tools from information theory to show that such embeddings maximize a well-known lower bound on mutual information between views, thereby connecting the geometric perspective of MMCR to the information-theoretic perspective commonly discussed in MVSSL. To better utilize MMCR, we mathematically predict and experimentally confirm non-monotonic changes in the pretraining loss akin to double descent but with respect to atypical hyperparameters. We also discover compute scaling laws that enable predicting the pretraining loss as a function of gradients steps, batch size, embedding dimension and number of views. We then show that MMCR, originally applied to image data, is performant on multimodal image-text data. By more deeply understanding the theoretical and empirical behavior of MMCR, our work reveals insights on improving MVSSL methods.

Show Abstract

On the universality of neural encodings in CNNs

F. Guth, Brice Ménard

We explore the universality of neural encodings in convolutional neural networks trained on image classification tasks. We develop a procedure to directly compare the learned weights rather than their representations. It is based on a factorization of spatial and channel dimensions and measures the similarity of aligned weight covariances. We show that, for a range of layers of VGG-type networks, the learned eigenvectors appear to be universal across different natural image datasets. Our results suggest the existence of a universal neural encoding for natural images. They explain, at a more fundamental level, the success of transfer learning. Our work shows that, instead of aiming at maximizing the performance of neural networks, one can alternatively attempt to maximize the universality of the learned encoding, in order to build a principled foundation model.

Show Abstract

Disentangling Recurrent Neural Dynamics with Stochastic Representational Geometry

D. Lipshutz, A. Nejatbakhsh, A. Williams

Uncovering and comparing the dynamical mechanisms that support neural processing remains a key challenge in the analysis of biological and artificial neural systems. However, measures of representational (dis)similarity in neural systems often assume that neural responses are static in time. Here, we show that stochastic shape distances (SSDs; Duong et al., 2023), which were developed to compare noisy neural responses to static inputs and lack an explicit notion of temporal structure, are well equipped to compare noisy dynamics. In two examples, we use SSDs, which interpolate between comparing mean trajectories and secondorder fluctuations about mean trajectories, to disentangle recurrent versus external contributions to noisy dynamics.

Show Abstract

Escaping saddle points efficiently with occupation-time-adapted perturbation

Xin Guo , J. Han, Mahan Tajrobehkar , Wenpin Tang

Motivated by the super-diffusivity of self-repelling random walk, which has roots in statistical physics, this paper develops a new perturbation mechanism for optimization algorithms. In this mechanism, perturbations are adapted to the history of states via the notion of occupation time. After integrating this mechanism into the framework of perturbed gradient descent (PGD) and perturbed accelerated gradient descent (PAGD), two new algorithms are proposed: perturbed gradient descent adapted to occupation time (PGDOT) and its accelerated version (PAGDOT). PGDOT and PAGDOT are guaranteed to avoid getting stuck at non-degenerate saddle points, and are shown to converge to second-order stationary points at least as fast as PGD and PAGD, respectively. The theoretical analysis is corroborated by empirical studies in which the new algorithms consistently escape saddle points and outperform not only their counterparts, PGD and PAGD, but also other popular alternatives including stochastic gradient descent, Adam, and several state-of-the-art adaptive gradient methods.

Show Abstract

Computational Design of Phosphotriesterase Improves V-Agent Degradation Efficiency

Jacob Kronenberg, Stanley Chu, D. Renfrew, et al.

Organophosphates (OPs) are a class of neurotoxic acetylcholinesterase inhibitors including widely used pesticides as well as nerve agents such as VX and VR. Current treatment of these toxins relies on reactivating acetylcholinesterase, which remains ineffective. Enzymatic scavengers are of interest for their ability to degrade OPs systemically before they reach their target. Here we describe a library of computationally designed variants of phosphotriesterase (PTE), an enzyme that is known to break down OPs. The mutations G208D, F104A, K77A, A80V, H254G, and I274N broadly improve catalytic efficiency of VX and VR hydrolysis without impacting the structure of the enzyme. The mutation I106 A improves catalysis of VR and L271E abolishes activity, likely due to disruptions of PTE's structure. This study elucidates the importance of these residues and contributes to the design of enzymatic OP scavengers with improved efficiency.

Show Abstract

Precision Medicine in Nephrology: An Integrative Framework of Multidimensional Data in the Kidney Precision Medicine Project

Tarek M. El-Achkar, Michael T. Eadon, R. Sealfon

Chronic kidney disease (CKD) and acute kidney injury (AKI) are heterogeneous syndromes defined clinically by serial measures of kidney function. Each condition possesses strong histopathologic associations, including glomerular obsolescence or acute tubular necrosis, respectively. Despite such characterization, there remains wide variation in patient outcomes and treatment responses. Precision medicine efforts, as exemplified by the Kidney Precision Medicine Project (KPMP), have begun to establish evolving, spatially anchored, cellular and molecular atlases of the cell types, states, and niches of the kidney in health and disease. The KPMP atlas provides molecular context for CKD and AKI disease drivers and will help define subtypes of disease that are not readily apparent from canonical functional or histopathologic characterization but instead are appreciable through advanced clinical phenotyping, pathomic, transcriptomic, proteomic, epigenomic, and metabolomic interrogation of kidney biopsy samples. This perspective outlines the structure of the KPMP, its approach to the integration of these diverse datasets, and its major outputs relevant to future patient care.

Show Abstract

Adequacy of the dynamical mean field theory for low density and Dirac materials

The qualitative reliability of the dynamical mean field theory (DMFT) is investigated for systems in which either the actual carrier density or the effective carrier density is low, by comparing the exact perturbative and dynamical mean field expressions of electron scattering rates and optical conductivities. We study two interacting systems: tight binding models in which the chemical potential is near a band edge and Dirac systems in which the chemical potential is near the Dirac point. In both systems it is found that DMFT underestimates the low frequency, near-Fermi surface single particle scattering rate by a factor proportional to the particle density. The quasiparticle effective mass is qualitatively incorrect for the low density tight binding model but not necessarily for Dirac systems. The dissipative part of the optical conductivity is more subtle: in the exact calculation vertex corrections, typically neglected in DMFT calculations, suppress the low frequency optical absorption, compensating for some of the DMFT underestimate of the scattering rate. The role of vertex corrections in calculating the conductivity for Dirac systems is clarified and a systematic discussion is given of the approach to the Galilean/Lorentz invariant low density limit. Relevance to recent calculations related to Weyl metals is discussed.

Show Abstract