2573 Publications

A robust and versatile computational peptide design pipeline to inform wet-lab experiments

V. Mulligan, Tristan Zaborniak , Benjamin P. Brown , D. Renfrew

Since Merrifield’s development of solid-phase peptide synthesis, we have seen explosive growth in the number of synthetic building-blocks that can be incorporated into peptides. This has created a problem: the number of possible molecules that could be synthesized is many orders of magnitude greater than the largest conceivable combinatorial libraries. Computational design, based on combinatorial optimization algorithms, addresses this problem by proposing sequences likely to have desired folds and functions. These computational methods complement experiments by reducing astronomically large numbers of combinatorial possibilities to experimentally tractable shortlists. This presentation describes our robust, versatile methods, made available to peptide scientists in the Rosetta and Masala software suites, for designing peptides that fold into rigid conformations. Our physics-based methods generalize to exotic chemical building blocks poorly amenable to machine learning-based methods for want of training data. Our pipeline has produced experimentally-validated mixed-chirality peptides that bind to targets of therapeutic interest, and peptides that diffuse across cell membranes. Ongoing research is mapping the sequence optimization problem (which grows intractable even for supercomputers as the number of candidate chemical building blocks grows very large) to current and near-future quantum computers, allowing use of quantum algorithms in the context of the existing, widely-used design protocols.

Show Abstract

On the construction of scattering matrices for irregular or elongated enclosures using Green’s representation formula

Carlos Borges, L. Greengard, Michael O'Neil , M. Rachh

Multiple scattering methods are widely used to reduce the computational complexity of acoustic or electromagnetic scattering problems when waves propagate through media containing many identical inclusions. Historically, this numerical technique has been limited to situations in which the inclusions (particles) can be covered by nonoverlapping disks in two dimensions or spheres in three dimensions. This allows for the use of separation of variables in cylindrical or spherical coordinates to represent the solution to the governing partial differential equation. Here, we provide a more flexible approach, applicable to a much larger class of geometries. We use a Green’s representation formula and the associated layer potentials to construct incoming and outgoing solutions on rectangular enclosures. The performance and flexibility of the resulting scattering operator formulation in two-dimensions is demonstrated via several numerical examples for multi-particle scattering in free space as well as in layered media. The mathematical formalism extends directly to the three dimensional case as well, and can easily be coupled with several commercial numerical PDE software packages.

Show Abstract

Learning Gaussian Multi-Index Models with Gradient Flow: Time Complexity and Directional Convergence

B. Şimşek, Amire Bendjeddou, Daniel Hsu

This work focuses on the gradient flow dynamics of a neural network model that uses correlation loss to approximate a multi-index function on high-dimensional standard Gaussian data. Specifically, the multi-index function we consider is a sum of neurons $f^*(x) \!=\! \sum_{j=1}^k \! \sigma^*(v_j^T x)$ where $v_1, \dots, v_k$ are unit vectors, and $\sigma^*$ lacks the first and second Hermite polynomials in its Hermite expansion. It is known that, for the single-index case ($k\!=\!1$), overcoming the search phase requires polynomial time complexity. We first generalize this result to multi-index functions characterized by vectors in arbitrary directions. After the search phase, it is not clear whether the network neurons converge to the index vectors, or get stuck at a sub-optimal solution. When the index vectors are orthogonal, we give a complete characterization of the fixed points and prove that neurons converge to the nearest index vectors. Therefore, using $n \! \asymp \! k \log k$ neurons ensures finding the full set of index vectors with gradient flow with high probability over random initialization. When $ v_i^T v_j \!=\! \beta \! \geq \! 0$ for all $i \neq j$, we prove the existence of a sharp threshold $\beta_c \!=\! c/(c+k)$ at which the fixed point that computes the average of the index vectors transitions from a saddle point to a minimum. Numerical simulations show that using a correlation loss and a mild overparameterization suffices to learn all of the index vectors when they are nearly orthogonal, however, the correlation loss fails when the dot product between the index vectors exceeds a certain threshold.

Show Abstract

Efficient Implementation of the Random Phase Approximation with Domain-based Local Pair Natural Orbitals

Yu Hsuan Liang, Xing Zhang, G. K. Chan, T. Berkelbach, Hong-Zhou Ye

We present an efficient implementation of the random phase approximation (RPA) for molecular systems within the domain-based local pair natural orbital (DLPNO) framework. With optimized parameters, DLPNO-RPA achieves approximately 99.9% accuracy in the total correlation energy compared to a canonical implementation, enabling highly accurate reaction energies and potential energy surfaces to be computed while substantially reducing computational costs. As an application, we demonstrate the capability of DLPNO-RPA to efficiently calculate basis set-converged binding energies for a set of large molecules, with results showing excellent agreement with high-level reference data from both coupled cluster and diffusion Monte Carlo. This development paves the way for the routine use of RPA-based methods in molecular quantum chemistry.

Show Abstract

Diabatic states of charge transfer with constrained charge equilibration

Sohang Kundu, Hong-Zhou Ye, T. Berkelbach

Charge transfer (CT) processes that are electronically non-adiabatic are ubiquitous in chemistry, biology, and materials science, but their theoretical description requires diabatic states or adiabatic excited states. For complex systems, these latter states are more difficult to calculate than the adiabatic ground state. Here, we propose a simple method to obtain diabatic states, including energies and charges, by constraining the atomic charges within the charge equilibration framework. For two-state systems, the exact diabatic coupling can be determined, from which the adiabatic excited-state energy can also be calculated. The method can be viewed as an affordable alternative to constrained density functional theory (CDFT), and so we call it constrained charge equilibration (CQEq). We test the CQEq method on the anthracene-tetracyanoethylene CT complex and the reductive decomposition of ethylene carbonate on a lithium metal surface. We find that CQEq predicts diabatic energies, charges, and adiabatic excitation energies in good agreement with CDFT, and we propose that CQEq is promising for combination with machine learning force fields to study non-adiabatic CT in the condensed phase.

Show Abstract

Nuclear instance segmentation and tracking for preimplantation mouse embryos

H. Nunley , Binglun Shao, Prateek Grover, A. Watters, S. Shvartsman, L. M. Brown, et al.

For investigations into fate specification and morphogenesis in time-lapse images of preimplantation embryos, automated 3D instance segmentation and tracking of nuclei are invaluable. Low signal-to-noise ratio, high voxel anisotropy, high nuclear density, and variable nuclear shapes can limit the performance of segmentation methods, while tracking is complicated by cell divisions, low frame rates, and sample movements. Supervised machine learning approaches can radically improve segmentation accuracy and enable easier tracking, but they often require large amounts of annotated 3D data. Here, we first report a previously unreported mouse line expressing near-infrared nuclear reporter H2B-miRFP720. We then generate a dataset (termed BlastoSPIM) of 3D images of H2B-miRFP720-expressing embryos with ground truth for nuclear instances. Using BlastoSPIM, we benchmark seven convolutional neural networks and identify Stardist-3D as the most accurate instance segmentation method. With our BlastoSPIM-trained Stardist-3D models, we construct a complete pipeline for nuclear instance segmentation and lineage tracking from the eight-cell stage to the end of preimplantation development (>100 nuclei). Finally, we demonstrate the usefulness of BlastoSPIM as pre-train data for related problems, both for a different imaging modality and for different model systems.

Show Abstract

Task-Relevant Covariance from Manifold Capacity Theory Improves Robustness in Deep Networks

William Yang, C. Chou , S. Chung

Analysis of high-dimensional representations in neuroscience and deep learning traditionally places equal importance on all points in a representation, potentially leading to significant information loss. Recent advances in manifold capacity theory offer a principled framework for identifying the computationally relevant points on neural manifolds. In this work, we introduce the concept of task-relevant class covariance to identify directions in representation-space supporting class discriminability. We demonstrate that scaling representations along these directions markedly improves simulated accuracy under distribution shift. Building on these insights, we propose AnchorBlocks, architectural modules that use task-relevant class covariance to align representations with a task-relevant eigenspace. By appending one AnchorBlock onto ResNet18, we achieve competitive performance in a standard domain adaptation benchmark (CIFAR-10C) against much larger robustness-promoting architectures. Our findings provide insight into neural population geometry and methods to interpret/build robust deep learning systems.

Show Abstract

Opening the Black Box inside Grover’s Algorithm

M. Stoudenmire, Xavier Waintal

Grover’s algorithm is one of the primary algorithms offered as evidence that quantum computers can provide an advantage over classical computers. It involves an “oracle” (external quantum subroutine), which must be specified for a given application and whose internal structure is not part of the formal scaling of the quadratic quantum speedup guaranteed by the algorithm. Grover's algorithm also requires exponentially many calls to the quantum oracle (approximately √2𝑛 calls where n is the number of qubits) to succeed, raising the question of its implementation on both noisy and error-corrected quantum computers. In this work, we construct a quantum-inspired algorithm executable on a classical computer that performs Grover’s task in a linear number of calls to (simulations of) the oracle—an exponentially smaller number than Grover’s algorithm—and demonstrate this algorithm explicitly for Boolean satisfiability problems. The complexity of our algorithm depends on the cost to simulate the oracle once, which may or may not be exponential, depending on its internal structure. Indeed, Grover’s algorithm does not have an a priori quantum speedup as soon as one is given access to the “source code” of the oracle, which may reveal an internal structure of the problem. Our findings illustrate this point explicitly, as our algorithm exploits the structure of the quantum circuit used to program the quantum computer to speed up the search. There are still problems where Grover’s algorithm would provide an asymptotic speedup if it could be run accurately for large enough sizes. Our quantum-inspired algorithm provides lower bounds, in terms of the quantum-circuit complexity, for the quantum hardware to beat classical approaches for these problems. These estimates, combined with the unfavorable scaling of the success probability of Grover’s algorithm, which in the presence of noise decays as the exponential of the exponential of the number of qubits, makes a practical speedup unrealistic even under extremely optimistic assumptions of the evolution of both hardware quality and availability.

Show Abstract

Bounding the speedup of the quantum-enhanced Markov-chain Monte Carlo algorithm

Sampling tasks are a natural class of problems for quantum computers due to the probabilistic nature of the Born rule. Sampling from useful distributions on noisy quantum hardware remains a challenging problem. A recent paper [D. Layden et al., Nature (London) 619, 282 (2023).] proposed a quantum-enhanced Markov-chain Monte Carlo algorithm where moves are generated by a quantum device and accepted or rejected by a classical algorithm. While this procedure is robust to noise and control imperfections, its potential for quantum advantage is unclear. Here we show that there is no speedup over classical sampling on a worst-case unstructured sampling problem. We present an upper bound to the Markov gap that rules out a speedup for any unital quantum proposal.
Show Abstract
November 1, 2024

WASPSYN: A Challenge for Domain Adaptive Synapse Detection in Microwasp Brain Connectomes

Yicong Li, Wanhua Li, Qi Chen, Wei Huang, Yuda Zou, Xin Xiao, K. Shinomiya, P. Gunn, Nishika Gupta, Alexey Polilov, Yongchao Xu, Yueyi Zhang, Zhiwei Xiong, Hanspeter Pfister, Donglai Wei, J. Wu

The size of image volumes in connectomics studies now reaches terabyte and often petabyte scales with a great diversity of appearance due to different sample preparation procedures. However, manual annotation of neuronal structures (e.g., synapses) in these huge image volumes is time-consuming, leading to limited labeled training data often smaller than 0.001% of the large-scale image volumes in application. Methods that can utilize in-domain labeled data and generalize to out-of-domain unlabeled data are in urgent need. Although many domain adaptation approaches are proposed to address such issues in the natural image domain, few of them have been evaluated on connectomics data due to a lack of domain adaptation benchmarks. Therefore, to enable developments of domain adaptive synapse detection methods for large-scale connectomics applications, we annotated 14 image volumes from a biologically diverse set of Megaphragma viggianii brain regions originating from three different whole-brain datasets and organized the WASPSYN challenge at ISBI 2023. The annotations include coordinates of pre-synapses and post-synapses in the 3D space, together with their one-to-many connectivity information. This paper describes the dataset, the tasks, the proposed baseline, the evaluation method, and the results of the challenge. Limitations of the challenge and the impact on neuroscience research are also discussed. The challenge is and will continue to be available at https://codalab.lisn.upsaclay.fr/competitions/9169. Successful algorithms that emerge from our challenge may potentially revolutionize real-world connectomics research and further the cause that aims to unravel the complexity of brain structure and function.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates

privacy consent banner

Privacy preference

We use cookies to provide you with the best online experience. By clicking "Accept All," you help us understand how our site is used and enhance its performance. You can change your choice at any time here. To learn more, please visit our Privacy Policy.