CCM: Publications

New Statistical Metric for Robust Target Detection in Cryo-EM Using 2DTM

Kexin Zhang, P. Cossio, A. Rangan, Bronwyn Lucas, Nikolaus Grigorieff

2D template matching (2DTM) can be used to detect molecules and their assemblies in cellular cryo-EM images with high positional and orientational accuracy. While 2DTM successfully detects spherical targets such as large ribosomal subunits, challenges remain in detecting smaller and more aspherical targets in various environments. In this work, a novel 2DTM metric, referred to as the 2DTM p-value, is developed to extend the 2DTM framework to more complex applications. The 2DTM p-value combines information from two previously used 2DTM metrics, namely the 2DTM signal-to-noise ratio (SNR) and z-score, which are derived from the cross-correlation coefficient between the target and the template. The 2DTM p-value demonstrates robust detection accuracies under various imaging and sample conditions and outperforms the 2DTM SNR and z-score alone. Specifically, the 2DTM p-value improves the detection of aspherical targets such as a modified artificial tubulin patch particle (500 kDa) and a much smaller clathrin monomer (193 kDa) in simulated data. It also accurately recovers mature 60S ribosomes in yeast lamellae samples, even under conditions of increased Gaussian noise. The new metric will enable the detection of a wider variety of targets in both purified and cellular samples through 2DTM.

Show Abstract

nifty-ls: Fast and Accurate Lomb–Scargle Periodograms Using a Non-uniform FFT

Lehman H. Garrison, D. Foreman-Mackey, Yu-hsuan Shih, A. Barnett

We present nifty-ls, a software package for fast and accurate evaluation of the Lomb–Scargle periodogram. nifty-ls leverages the fact that Lomb–Scargle can be computed using a non-uniform fast Fourier transform (NUFFT), which we evaluate with the Flatiron Institute NUFFT package (finufft). This approach achieves a many-fold speedup over the Press & Rybicki method as implemented in Astropy and is simultaneously many orders of magnitude more accurate. nifty-ls also supports fast evaluation on GPUs via CUDA and integrates with the Astropy Lomb–Scargle interface. nifty-ls is publicly available at https://github.com/flatironinstitute/nifty-ls/.

Show Abstract

Coordinate complexification for the Helmholtz equation with Dirichlet boundary conditions in a perturbed half-space

C. Epstein, L. Greengard, Jeremy Hoskins, S. Jiang, M. Rachh

We present a new complexification scheme based on the classical double layer potential for the solution of the Helmholtz equation with Dirichlet boundary conditions in compactly perturbed half-spaces in two and three dimensions. The kernel for the double layer potential is the normal derivative of the free-space Green's function, which has a well-known analytic continuation into the complex plane as a function of both target and source locations. Here, we prove that - when the incident data are analytic and satisfy a precise asymptotic estimate - the solution to the boundary integral equation itself admits an analytic continuation into specific regions of the complex plane, and satisfies a related asymptotic estimate (this class of data includes both plane waves and the field induced by point sources). We then show that, with a carefully chosen contour deformation, the oscillatory integrals are converted to exponentially decaying integrals, effectively reducing the infinite domain to a domain of finite size. Our scheme is different from existing methods that use complex coordinate transformations, such as perfectly matched layers, or absorbing regions, such as the gradual complexification of the governing wavenumber. More precisely, in our method, we are still solving a boundary integral equation, albeit on a truncated, complexified version of the original boundary. In other words, no volumetric/domain modifications are introduced. The scheme can be extended to other boundary conditions, to open wave guides and to layered media. We illustrate the performance of the scheme with two and three dimensional examples.

Show Abstract

A comprehensive exploration of quasisymmetric stellarators and their coil sets

A. Giuliani, Eduardo Rodríguez, M. Spivak

We augment the `QUAsi-symmetric Stellarator Repository' (QUASR) to include vacuum field stellarators with quasihelical symmetry using a globalized optimization workflow. The database now has almost 370,000 quasisaxisymmetry and quasihelically symmetric devices along with coil sets, optimized for a variety of aspect ratios, rotational transforms, and discrete rotational symmetries. This paper outlines a couple of ways to explore and characterize the data set. We plot devices on a near-axis quasisymmetry landscape, revealing close correspondence to this predicted landscape. We also use principal component analysis to reduce the dimensionality of the data so that it can easily be visualized in two or three dimensions. Principal component analysis also gives a mechanism to compare the new devices here to previously published ones in the literature. We are able to characterize the structure of the data, observe clusters, and visualize the progression of devices in these clusters. These techniques reveal that the data has structure, and that typically one, two or three principal components are sufficient to characterize it. QUASR is archived at this https URL and can be explored online at this http URL.

Show Abstract

Decomposing imaginary time Feynman diagrams using separable basis functions: Anderson impurity model strong coupling expansion

J. Kaye, Zhen Huang, Hugo Strand, Denis Golez

We present a deterministic algorithm for the efficient evaluation of imaginary time diagrams based on the recently introduced discrete Lehmann representation (DLR) of imaginary time Green's functions. In addition to the efficient discretization of diagrammatic integrals afforded by its approximation properties, the DLR basis is separable in imaginary time, allowing us to decompose diagrams into linear combinations of nested sequences of one-dimensional products and convolutions. Focusing on the strong coupling bold-line expansion of generalized Anderson impurity models, we show that our strategy reduces the computational complexity of evaluating an $M$th-order diagram at inverse temperature $\beta$ and spectral width $\omega_{\max}$ from $\mathcal{O}((\beta \omega_{\max})^{2M-1})$ for a direct quadrature to $\mathcal{O}(M (\log (\beta \omega_{\max}))^{M+1})$, with controllable high-order accuracy. We benchmark our algorithm using third-order expansions for multi-band impurity problems with off-diagonal hybridization and spin-orbit coupling, presenting comparisons with exact diagonalization and quantum Monte Carlo approaches. In particular, we perform a self-consistent dynamical mean-field theory calculation for a three-band Hubbard model with strong spin-orbit coupling representing a minimal model of Ca$_2$RuO$_4$, demonstrating the promise of the method for modeling realistic strongly correlated multi-band materials. For both strong and weak coupling expansions of low and intermediate order, in which diagrams can be enumerated, our method provides an efficient, straightforward, and robust black-box evaluation procedure. In this sense, it fills a gap between diagrammatic approximations of the lowest order, which are simple and inexpensive but inaccurate, and those based on Monte Carlo sampling of high-order diagrams.

Show Abstract

Cosmological constraints from non-Gaussian and nonlinear galaxy clustering using the SimBIG inference framework

ChangHoon Hahn, Pablo Lemos, Liam Parker, B. Régaldo-Saint Blancard, M. Eickenberg, Shirley Ho, Ph.D. , Jiamin Hou, Elena Massara , Chirag Modi , Azadeh Moradinezhad Dizgah, David Spergel, Ph.D.

The standard ΛCDM cosmological model predicts the presence of cold dark matter, with the current accelerated expansion of the Universe driven by dark energy. This model has recently come under scrutiny because of tensions in measurements of the expansion and growth histories of the Universe, parameterized using H0 and S8. The three-dimensional clustering of galaxies encodes key cosmological information that addresses these tensions. Here we present a set of cosmological constraints using simulation-based inference that exploits additional non-Gaussian information on nonlinear scales from galaxy clustering, inaccessible with current analyses. We analyse a subset of the Baryon Oscillation Spectroscopic Survey (BOSS) galaxy survey using SimBIG, a new framework for cosmological inference that leverages high-fidelity simulations and deep generative models. We use two clustering statistics beyond the standard power spectrum: the bispectrum and a summary of the galaxy field based on a convolutional neural network. We constrain H0 and S8 1.5 and 1.9 times more tightly than power spectrum analyses. With this increased precision, our constraints are competitive with those of other cosmological probes, even with only 10% of the full BOSS volume. Future work extending SimBIG to upcoming spectroscopic galaxy surveys (DESI, PFS, Euclid) will produce improved cosmological constraints that will develop understanding of cosmic tensions.

Show Abstract

Incorporating Local Step-Size Adaptivity into the No-U-Turn Sampler using Gibbs Self Tuning

N. Bou-Rabee, B. Carpenter, Tore Selland Kleppe, Milo Marsden

Adapting the step size locally in the no-U-turn sampler (NUTS) is challenging because the step-size and path-length tuning parameters are interdependent. The determination of an optimal path length requires a predefined step size, while the ideal step size must account for errors along the selected path. Ensuring reversibility further complicates this tuning problem. In this paper, we present a method for locally adapting the step size in NUTS that is an instance of the Gibbs self-tuning (GIST) framework. Our approach guarantees reversibility with an acceptance probability that depends exclusively on the conditional distribution of the step size. We validate our step-size-adaptive NUTS method on Neal's funnel density and a high-dimensional normal distribution, demonstrating its effectiveness in challenging scenarios.

Show Abstract

Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics

Alireza Mousavi-Hosseini, D. Wu, Murat A. Erdogdu

We study the problem of learning multi-index models in high-dimensions using a two-layer neural network trained with the mean-field Langevin algorithm. Under mild distributional assumptions on the data, we characterize the effective dimension $d_{\mathrm{eff}}$ that controls both sample and computational complexity by utilizing the adaptivity of neural networks to latent low-dimensional structures. When the data exhibit such a structure, $d_{\mathrm{eff}}$ can be significantly smaller than the ambient dimension. We prove that the sample complexity grows almost linearly with $d_{\mathrm{eff}}$, bypassing the limitations of the information and generative exponents that appeared in recent analyses of gradient-based feature learning. On the other hand, the computational complexity may inevitably grow exponentially with $d_{\mathrm{eff}}$ in the worst-case scenario. Motivated by improving computational complexity, we take the first steps towards polynomial time convergence of the mean-field Langevin algorithm by investigating a setting where the weights are constrained to be on a compact manifold with positive Ricci curvature, such as the hypersphere. There, we study assumptions under which polynomial time convergence is achievable, whereas similar assumptions in the Euclidean setting lead to exponential time complexity.

Show Abstract

CryoBench: Diverse and challenging datasets for the heterogeneity problem in cryo-EM

Minkyu Jeon, M. Astore, S. Hanson, P. Cossio, et al.

Cryo-electron microscopy (cryo-EM) is a powerful technique for determining high-resolution 3D biomolecular structures from imaging data. As this technique can capture dynamic biomolecular complexes, 3D reconstruction methods are increasingly being developed to resolve this intrinsic structural heterogeneity. However, the absence of standardized benchmarks with ground truth structures and validation metrics limits the advancement of the field. Here, we propose CryoBench, a suite of datasets, metrics, and performance benchmarks for heterogeneous reconstruction in cryo-EM. We propose five datasets representing different sources of heterogeneity and degrees of difficulty. These include conformational heterogeneity generated from simple motions and random configurations of antibody complexes and from tens of thousands of structures sampled from a molecular dynamics simulation. We also design datasets containing compositional heterogeneity from mixtures of ribosome assembly states and 100 common complexes present in cells. We then perform a comprehensive analysis of state-of-the-art heterogeneous reconstruction tools including neural and non-neural methods and their sensitivity to noise, and propose new metrics for quantitative comparison of methods. We hope that this benchmark will be a foundational resource for analyzing existing methods and new algorithmic development in both the cryo-EM and machine learning communities.

Show Abstract

cppdlr: Imaginary time calculations using the discrete Lehmann representation

J. Kaye, Hugo U. r. Strand, Nils Wentzell

We introduce cppdlr, a C++ library implementing the discrete Lehmann representation (DLR) of functions in imaginary time and Matsubara frequency, such as Green's functions and self-energies. The DLR is based on a low-rank approximation of the analytic continuation kernel, and yields a compact and explicit basis consisting of exponentials in imaginary time and simple poles in Matsubara frequency. cppdlr constructs the DLR basis and associated interpolation grids, and implements standard operations. It provides a flexible yet high-level interface, facilitating the incorporation of the DLR into both small-scale applications and existing large-scale software projects.

Show Abstract