CCM: Publications

Structural Variability from Noisy Tomographic Projections

In cryo-electron microscopy, the three-dimensional (3D) electric potentials of an ensemble of molecules are projected along arbitrary viewing directions to yield noisy two-dimensional images. The volume maps representing these potentials typically exhibit a great deal of structural variability, which is described by their 3D covariance matrix. Typically, this covariance matrix is approximately low rank and can be used to cluster the volumes or estimate the intrinsic geometry of the conformation space. We formulate the estimation of this covariance matrix as a linear inverse problem, yielding a consistent least-squares estimator. For $n$ images of size N-by-N pixels, we propose an algorithm for calculating this covariance estimator with computational complexity O(n * N^4+√κ * N^6 * log(N)), where the condition number κ is empirically in the range 10-200. Its efficiency relies on the observation that the normal equations are equivalent to a deconvolution problem in six dimensions. This is then solved by the conjugate gradient method with an appropriate circulant preconditioner. The result is the first computationally efficient algorithm for consistent estimation of the 3D covariance from noisy projections. It also compares favorably in runtime with respect to previously proposed nonconsistent estimators. Motivated by the recent success of eigenvalue shrinkage procedures for high-dimensional covariance matrix estimation, we incorporate a shrinkage procedure that improves accuracy at lower signal-to-noise ratios. We evaluate our methods on simulated datasets and achieve classification results comparable to state-of-the-art methods in shorter running time. We also present results on clustering volumes in an experimental dataset, illustrating the power of the proposed algorithm for practical determination of structural variability.

Show Abstract

Integral equation methods for electrostatics, acoustics and electromagnetics in smoothly varying, anisotropic media

Lise-Marie Imbert-Gerard, Felipe Vico, L. Greengard, Miguel Ferrando

We present a collection of well-conditioned integral equation methods for the solution of electrostatic, acoustic or electromagnetic scattering problems involving anisotropic, inhomogeneous media. In the electromagnetic case, our approach involves a minor modification of a classical formulation. In the electrostatic or acoustic setting, we introduce a new vector partial differential equation, from which the desired solution is easily obtained. It is the vector equation for which we derive a well-conditioned integral equation. In addition to providing a unified framework for these solvers, we illustrate their performance using iterative solution methods coupled with the FFT-based technique of [1] to discretize and apply the relevant integral operators.

Show Abstract

Prediction error bounds for linear regression with the TREX

Jacob Bien, Irina Gaynanova, Johannes Lederer, C. Müller

The TREX is a recently introduced approach to sparse linear regression. In contrast to most well-known approaches to penalized regression, the TREX can be formulated without the use of tuning parameters. In this paper, we establish the first known prediction error bounds for the TREX. Additionally, we introduce extensions of the TREX to a more general class of penalties, and we provide a bound on the prediction error in this generalized setting. These results deepen the understanding of the TREX from a theoretical perspective and provide new insights into penalized regression in general.

Show Abstract

An adaptive fast Gauss transform in two dimensions

J. Wang, L. Greengard

A variety of problems in computational physics and engineering require the convolution of the heat kernel (a Gaussian) with either discrete sources, densities supported on boundaries, or continuous volume distributions. We present a unified fast Gauss transform for this purpose in two dimensions, making use of an adaptive quad-tree discretization on a unit square which is assumed to contain all sources. Our implementation permits either free-space or periodic boundary conditions to be imposed, and is efficient for any choice of variance in the Gaussian.

Show Abstract

Decoupled field integral equations for electromagnetic scattering from homogeneous penetrable obstacles

Felipe Vico, L. Greengard, Miguel Ferrando

We present a new method for the analysis of electromagnetic scattering from homogeneous penetrable bodies. Our approach is based on a reformulation of the governing Maxwell equations in terms of two uncoupled vector Helmholtz systems: one for the electric feld and one for the magnetic field. This permits the derivation of resonance-free Fredholm equations of the second kind that are stable at all frequencies, insensitive to the genus of the scatterers, and invertible for all passive materials including those with negative permittivities or permeabilities. We refer to these as decoupled field integral equations.

Show Abstract

Temporal probabilistic modeling of bacterial compositions derived from 16S rRNA sequencing

Tarmo Äijö, C. Müller, R. Bonneau

The number of microbial and metagenomic studies has increased drastically due to advancements in next-generation sequencing-based measurement techniques. Statistical analysis and the validity of conclusions drawn from (time series) 16S rRNA and other metagenomic sequencing data is hampered by the presence of significant amount of noise and missing data (sampling zeros). Accounting uncertainty in microbiome data is often challenging due to the difficulty of obtaining biological replicates. Additionally, the compositional nature of current amplicon and metagenomic data differs from many other biological data types adding another challenge to the data analysis. To address these challenges in human microbiome research, we introduce a novel probabilistic approach to explicitly model overdispersion and sampling zeros by considering the temporal correlation between nearby time points using Gaussian Processes. The proposed Temporal Gaussian Process Model for Compositional Data Analysis (TGP-CODA) shows superior modeling performance compared to commonly used Dirichlet-multinomial, multinomial, and non-parametric regression models on real and synthetic data. We demonstrate that the nonreplicative nature of human gut microbiota studies can be partially overcome by our method with proper experimental design of dense temporal sampling. We also show that different modeling approaches have a strong impact on ecological interpretation of the data, such as stationarity, persistence, and environmental noise models. A Stan implementation of the proposed method is available under MIT license at

Show Abstract

APPLE Picker: Automatic Particle Picking, a Low-Effort Cryo-EM Framework

Ayelet Heimowitz, J. Andén, A. Singer

Particle picking is a crucial first step in the computational pipeline of single-particle cryo-electron microscopy (cryo-EM). Selecting particles from the micrographs is difficult especially for small particles with low contrast. As high-resolution reconstruction typically requires hundreds of thousands of particles, manually picking that many particles is often too time-consuming. While semi-automated particle picking is currently a popular approach, it may suffer from introducing manual bias into the selection process. In addition, semi-automated particle picking is still somewhat time-consuming. This paper presents the APPLE (Automatic Particle Picking with Low user Effort) picker, a simple and novel approach for fast, accurate, and fully automatic particle picking. While our approach was inspired by template matching, it is completely template-free. This approach is evaluated on publicly available datasets containing micrographs of β-galactosidase and keyhole limpet hemocyanin projections.

Show Abstract

Non-convex Global Minimization and False Discovery Rate Control for the TREX

J. Bien, Irina Gaynanova, Johannes Lederer, C. Müller

The TREX is a recently introduced method for performing sparse high-dimensional regression. Despite its statistical promise as an alternative to the lasso, square-root lasso, and scaled lasso, the TREX is computationally challenging in that it requires solving a nonconvex optimization problem. This article shows a remarkable result: despite the nonconvexity of the TREX problem, there exists a polynomial-time algorithm that is guaranteed to find the global minimum. This result adds the TREX to a very short list of nonconvex optimization problems that can be globally optimized (principal components analysis being a famous example). After deriving and developing this new approach, we demonstrate that (i) the ability of the preexisting TREX heuristic to reach the global minimum is strongly dependent on the difficulty of the underlying statistical problem, (ii) the new polynomial-time algorithm for TREX permits a novel variable ranking and selection scheme, (iii) this scheme can be incorporated into a rule that controls the false discovery rate (FDR) of included features in the model. To achieve this last aim, we provide an extension of the results of Barber and Candes to establish that the knockoff filter framework can be applied to the TREX. This investigation thus provides both a rare case study of a heuristic for nonconvex optimization and a novel way of exploiting nonconvexity for statistical inference.

Show Abstract

Perspective Functions: Proximal Calculus and Applications in High-Dimensional Statistics

Patrick L Combettes , C. Müller

Perspective functions arise explicitly or implicitly in various forms in applied mathematics and in statistical data analysis. To date, no systematic strategy is available to solve the associated, typically nonsmooth, optimization problems. In this paper, we fill this gap by showing that proximal methods provide an efficient framework to model and solve problems involving perspective functions. We study the construction of the proximity operator of a perspective function under general assumptions and present important instances in which the proximity operator can be computed explicitly or via straightforward numerical operations. These results constitute central building blocks in the design of proximal optimization algorithms. We showcase the versatility of the framework by designing novel proximal algorithms for state-of-the-art regression and variable selection schemes in high-dimensional statistics.

Show Abstract

Fungi stabilize connectivity in the lung and skin microbial ecosystems

Laura Tipton, C. Müller, Zachary D Kurtz , Laurence Huang, Eric Kleerup, Alison Morris, R. Bonneau, Elodie Ghedin

\textbf{Background:} No microbe exists in isolation, and few live in environments with only members of their own kingdom or domain. As microbiome studies become increasingly more interested in the interactions between microbes than in cataloging which microbes are present, the variety of microbes in the community should be considered. However, the majority of ecological interaction networks for microbiomes built to date have included only bacteria. Joint association inference across multiple domains of life, e.g., fungal communities (the mycobiome) and bacterial communities, has remained largely elusive. \textbf{Results:} Here, we present a novel extension of the SParse InversE Covariance estimation for Ecological ASsociation Inference (SPIEC-EASI) framework that allows statistical inference of cross-domain associations from targeted amplicon sequencing data. For human lung and skin micro- and mycobiomes, we show that cross-domain networks exhibit higher connectivity, increased network stability, and similar topological re-organization patterns compared to single-domain networks. We also validate in vitro a small number of cross-domain interactions predicted by the skin association network. \textbf{Conclusions:} For the human lung and skin micro- and mycobiomes, our findings suggest that fungi play a stabilizing role in ecological network organization. Our study suggests that computational efforts to infer association networks that include all forms of microbial life, paired with large-scale culture-based association validation experiments, will help formulate concrete hypotheses about the underlying biological mechanisms of species interactions and, ultimately, help understand microbial communities as a whole.

Show Abstract