CCM: Publications

Multiple-Allele MHC Class II Epitope Engineering by a Molecular Dynamics-Based Evolution Protocol

Rodrigo Ochoa, Victoria Alves Santos Lunardelli, Daniela Santoro Rosa, Alessandro Laio, P. Cossio

Epitopes that bind simultaneously to all human alleles of Major Histocompatibility Complex class II (MHC II) are considered one of the key factors for the development of improved vaccines and cancer immunotherapies. To engineer MHC II multiple-allele binders, we developed a protocol called PanMHC-PARCE, based on the unsupervised optimization of the epitope sequence by single-point mutations, parallel explicit-solvent molecular dynamics simulations and scoring of the MHC II-epitope complexes. The key idea is accepting mutations that not only improve the affinity but also reduce the affinity gap between the alleles. We applied this methodology to enhance a Plasmodium vivax epitope for multiple-allele binding. In vitro rate-binding assays showed that four engineered peptides were able to bind with improved affinity toward multiple human MHC II alleles. Moreover, we demonstrated that mice immunized with the peptides exhibited interferon-gamma cellular immune response. Overall, the method enables the engineering of peptides with improved binding properties that can be used for the generation of new immunotherapies.

Show Abstract

Vitruvion: A Generative Model of Parametric CAD Sketches

Ari Seff, W. Zhou, Nick Richardson, Ryan P. Adams

Parametric computer-aided design (CAD) tools are the predominant way that engineers specify physical structures, from bicycle pedals to airplanes to printed circuit boards. The key characteristic of parametric CAD is that design intent is encoded not only via geometric primitives, but also by parameterized constraints between the elements. This relational specification can be viewed as the construction of a constraint program, allowing edits to coherently propagate to other parts of the design. Machine learning offers the intriguing possibility of accelerating the design process via generative modeling of these structures, enabling new tools such as autocompletion, constraint inference, and conditional synthesis. In this work, we present such an approach to generative modeling of parametric CAD sketches, which constitute the basic computational building blocks of modern mechanical design. Our model, trained on real-world designs from the SketchGraphs dataset, autoregressively synthesizes sketches as sequences of primitives, with initial coordinates, and constraints that reference back to the sampled primitives. As samples from the model match the constraint graph representation used in standard CAD software, they may be directly imported, solved, and edited according to downstream design tasks. In addition, we condition the model on various contexts, including partial sketches (primers) and images of hand-drawn sketches. Evaluation of the proposed approach demonstrates its ability to synthesize realistic CAD sketches and its potential to aid the mechanical design workflow.

Show Abstract

Simulation-based inference of single-molecule force spectroscopy

Lars Dingeldein, P. Cossio, Roberto Covino

Single-molecule force spectroscopy (smFS) is a powerful approach to studying molecular self-organization. However, the coupling of the molecule with the ever-present experimental device introduces artifacts, that complicate the interpretation of these experiments. Performing statistical inference to learn hidden molecular properties is challenging because these measurements produce non-Markovian time series, and even minimal models lead to intractable likelihoods. To overcome these challenges, we developed a computational framework built on novel statistical methods called simulation-based inference (SBI). SBI enabled us to directly estimate the Bayesian posterior, and extract reduced quantitative models from smFS, by encoding a mechanistic model into a simulator in combination with probabilistic deep learning. Using synthetic data, we could systematically disentangle the measurement of hidden molecular properties from experimental artifacts. The integration of physical models with machine-learning density estimation is general, transparent, easy to use, and broadly applicable to other types of biophysical experiments.

Show Abstract

Simulation-based inference of single-molecule force spectroscopy

Lars Dingeldein, P. Cossio, Roberto Covino

Single-molecule force spectroscopy (smFS) is a powerful approach to studying molecular self-organization. However, the coupling of the molecule with the ever-present experimental device introduces artifacts, that complicate the interpretation of these experiments. Performing statistical inference to learn hidden molecular properties is challenging because these measurements produce non-Markovian time series, and even minimal models lead to intractable likelihoods. To overcome these challenges, we developed a computational framework built on novel statistical methods called simulation-based inference (SBI). SBI enabled us to directly estimate the Bayesian posterior, and extract reduced quantitative models from smFS, by encoding a mechanistic model into a simulator in combination with probabilistic deep learning. Using synthetic data, we could systematically disentangle the measurement of hidden molecular properties from experimental artifacts. The integration of physical models with machine-learning density estimation is general, transparent, easy to use, and broadly applicable to other types of biophysical experiments.

Show Abstract

Wavelet Moments for Cosmological Parameter Estimation

M. Eickenberg, Erwan Allys, Azadeh Moradinezhad Dizgah, Pablo Lemos, E. Massara, Muntazir Abidi, ChangHoon Hahn, S. Hassan, B. Régaldo-Saint Blancard, S. Ho, S. Mallat, J. Andén, F. Villaescusa-Navarro

Extracting non-Gaussian information from the non-linear regime of structure formation is key to fully exploiting the rich data from upcoming cosmological surveys probing the large-scale structure of the universe. However, due to theoretical and computational complexities, this remains one of the main challenges in analyzing observational data. We present a set of summary statistics for cosmological matter fields based on 3D wavelets to tackle this challenge. These statistics are computed as the spatial average of the complex modulus of the 3D wavelet transform raised to a power q and are therefore known as invariant wavelet moments. The 3D wavelets are constructed to be radially band-limited and separable on a spherical polar grid and come in three types: isotropic, oriented, and harmonic. In the Fisher forecast framework, we evaluate the performance of these summary statistics on matter fields from the Quijote suite, where they are shown to reach state-of-the-art parameter constraints on the base ΛCDM parameters, as well as the sum of neutrino masses. We show that we can improve constraints by a factor 5 to 10 in all parameters with respect to the power spectrum baseline.

Show Abstract

A general sample complexity analysis of vanilla policy gradient

Rui Yuan, R. M. Gower, Alessandro Lazaric

We adapt recent tools developed for the analysis of Stochastic Gradient Descent (SGD) in non-convex optimization to obtain convergence and sample complexity guarantees for the vanilla policy gradient (PG). Our only assumptions are that the expected return is smooth w.r.t. the policy parameters, that its H-step truncated gradient is close to the exact gradient, and a certain ABC assumption. This assumption requires the second moment of the estimated gradient to be bounded by A ≥ 0 times the suboptimality gap, B ≥ 0 times the norm of the full batch gradient and an additive constant C ≥ 0, or any combination of aforementioned. We show that the ABC assumption is more general than the commonly used assumptions on the policy space to prove convergence to a stationary point. We provide a single convergence theorem that recovers the O(−4) sample complexity of PG. Our results also affords greater f lexibility in the choice of hyper parameters such as the step size and places no restriction on the batch size m, including the single trajectory case (i.e., m = 1). We then instantiate our theorem in different settings, where we both recover existing results and obtained improved sample complexity, e.g., for convergence to the global optimum for Fisher-nondegenerated parameterized policies.

Show Abstract

SAN: Stochastic Average Newton Algorithm for Minimizing Finite Sums

Jiabin Chen, Rui Yuan, Guillaume Garrigos, R. M. Gower

We present a principled approach for designing stochastic Newton methods for solving f inite sum optimization problems. Our approach has two steps. First, we re-write the stationarity conditions as a system of nonlinear equations that associates each data point to a new row. Second, we apply a Subsampled Newton Raphson method to solve this system of nonlinear equations. Using our approach, we develop a new Stochastic Average Newton (SAN) method, which is incremental by design, in that it requires only a single data point per iteration. It is also cheap to implement when solving regularized generalized linear models, with a cost per iteration of the order of the number of the parameters. We show through numerical experiments that SAN requires no knowledge about the problem, neither parameter tuning, while remaining competitive as compared to classical variance reduced gradient methods (e.g. SAG and SVRG), incremental Newton and quasiNewton methods (e.g. SNM, IQN).

Show Abstract

Debye source representations for type-I superconductors, I: The static type I case

C. Epstein, M. Rachh

In this note, we analyze the classical magneto-static approach to the theory of type I superconductors, and a Debye source representation that can be used numerically to solve the resultant equations. We also prove that one of the fields, J−, found within the superconductor via the London equations, is the physical current in that the outgoing part of the magnetic field is given as the Biot-Savart integral of J−. Finally, we compute the static currents for moderate values of London penetration depth, λL, for a sphere, a stellarator-like geometry and a two-holed torus.

Show Abstract

Multi-omic analysis along the gut-brain axis points to a functional architecture of autism

B. Carpenter, James T. Morton, Dong-Min Jin, Robert H. Mills, Yan Shao, Gibraan Rahman, Daniel McDonald, Kirsten Berding, Brittany D. Needham, Et Al.

Autism is a highly heritable neurodevelopmental disorder characterized by heterogeneous cognitive, behavioral and communication impairments. Disruption of the gut-brain axis (GBA) has been implicated in autism, with dozens of cross-sectional microbiome and other omic studies revealing autism-specific profiles along the GBA albeit with little agreement in composition or magnitude. To explore the functional architecture of autism, we developed an age and sex-matched Bayesian differential ranking algorithm that identified autism-specific profiles across 10 cross-sectional microbiome datasets and 15 other omic datasets, including dietary patterns, metabolomics, cytokine profiles, and human brain expression profiles. The analysis uncovered a highly significant, functional architecture along the GBA that encapsulated the overall heterogeneity of autism phenotypes. This architecture was determined by autism-specific amino acid, carbohydrate and lipid metabolism profiles predominantly encoded by microbial species in the genera Prevotella, Enterococcus, Bifidobacterium, and Desulfovibrio, and was mirrored in brain-associated gene expression profiles and restrictive dietary patterns in individuals with autism. Pro-inflammatory cytokine profiling and virome association analysis further supported the existence of an autism-specific architecture associated with particular microbial genera. Re-analysis of a longitudinal intervention study in autism recapitulated the cross-sectional profiles, and showed a strong association between temporal changes in microbiome composition and autism symptoms. Further elucidation of the functional architecture of autism, including of the role the microbiome plays in it, will require deep, multi-omic longitudinal intervention studies on well-defined stratified cohorts to support causal and mechanistic inference.

Show Abstract

Cutting Some Slack for SGD with Adaptive Polyak Stepsizes

R. M. Gower, Mathieu Blondel, Nidham Gazagnadou, Fabian Pedregosa

Tuning the step size of stochastic gradient descent is tedious and error prone. This has motivated the development of methods that automatically adapt the step size using readily available information. In this paper, we consider the family of SPS (Stochastic gradient with a Polyak Stepsize) adaptive methods. These are methods that make use of gradient and loss value at the sampled points to adaptively adjust the step size. We first show that SPS and its recent variants can all be seen as extensions of the Passive-Aggressive methods applied to nonlinear problems. We use this insight to develop new variants of the SPS method that are better suited to nonlinear models. Our new variants are based on introducing a slack variable into the interpolation equations. This single slack variable tracks the loss function across iterations and is used in setting a stable step size. We provide extensive numerical results supporting our new methods and a convergence theory.

Show Abstract