CCM: Publications

Robust classification of protein variation using structural modelling and large-scale data integration

E Baugh, R Simmons-Edler, C. Müller, R Alford, N. Volfovsky, R. Bonneau

Existing methods for interpreting protein variation focus on annotating mutation pathogenicity rather than detailed interpretation of variant deleteriousness and frequently use only sequence-based or structure-based information. We present VIPUR, a computational framework that seamlessly integrates sequence analysis and structural modelling (using the Rosetta protein modelling suite) to identify and interpret deleterious protein variants. To train VIPUR, we collected 9477 protein variants with known effects on protein function from multiple organisms and curated structural models for each variant from crystal structures and homology models. VIPUR can be applied to mutations in any organism's proteome with improved generalized accuracy (AUROC .83) and interpretability (AUPR .87) compared to other methods. We demonstrate that VIPUR's predictions of deleteriousness match the biological phenotypes in ClinVar and provide a clear ranking of prediction confidence. We use VIPUR to interpret known mutations associated with inflammation and diabetes, demonstrating the structural diversity of disrupted functional sites and improved interpretation of mutations associated with human diseases. Lastly, we demonstrate VIPUR's ability to highlight candidate variants associated with human diseases by applying VIPUR to de novo variants associated with autism spectrum disorders.

Show Abstract

Inferring causal molecular networks: empirical assessment through a community-based effort

Steven M Hill, Laura M Heiser, Thomas Cokelaer, Michael Unger, Nicole K Nesser , Daniel E Carlin, Yang Zhang, Artem Sokolov, Evan O Paull , Chris K Wong, C. Müller, et al.

It remains unclear whether causal, rather than merely correlational, relationships in molecular networks can be inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge, which focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective, and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess inferred molecular networks in a causal sense.

Show Abstract

Fast Direct Methods for Gaussian Processes

Sivaram Ambikasaran, Daniel Foreman-Mackey, L. Greengard, David W. Hogg, Michael O'Neil

A number of problems in probability and statistics can be addressed using the multivariate normal (Gaussian) distribution. In the one-dimensional case, computing the probability for a given mean and variance simply requires the evaluation of the corresponding Gaussian density. In the $n$-dimensional setting, however, it requires the inversion of an $n \times n$ covariance matrix, $C$, as well as the evaluation of its determinant, $\det(C)$. In many cases, such as regression using Gaussian processes, the covariance matrix is of the form $C = \sigma^2 I + K$, where $K$ is computed using a specified covariance kernel which depends on the data and additional parameters (hyperparameters). The matrix $C$ is typically dense, causing standard direct methods for inversion and determinant evaluation to require $\mathcal O(n^3)$ work. This cost is prohibitive for large-scale modeling. Here, we show that for the most commonly used covariance functions, the matrix $C$ can be hierarchically factored into a product of block low-rank updates of the identity matrix, yielding an $\mathcal O (n\log^2 n) $ algorithm for inversion. More importantly, we show that this factorization enables the evaluation of the determinant $\det(C)$, permitting the direct calculation of probabilities in high dimensions under fairly broad assumptions on the kernel defining $K$. Our fast algorithm brings many problems in marginalization and the adaptation of hyperparameters within practical reach using a single CPU core. The combination of nearly optimal scaling in terms of problem size with high-performance computing resources will permit the modeling of previously intractable problems. We illustrate the performance of the scheme on standard covariance kernels.

Show Abstract

Simultaneous Denoising, Deconvolution, and Demixing of Calcium Imaging Data

E. Pnevmatikakis, Daniel Soudry, Yuanjun Gao, Timothy A Machado, Josh Merel, David Pfau, Thomas Reardon, Yu Mu, Clay Lacefield, Weijian Yang, Misha Ahrens, Randy Bruno , Thomas M Jessell, Darcy S Peterka, Rafael Yuste, Liam Paninski

We present a modular approach for analyzing calcium imaging recordings of large neuronal ensembles. Our goal is to simultaneously identify the locations of the neurons, demix spatially overlapping components, and denoise and deconvolve the spiking activity from the slow dynamics of the calcium indicator. Our approach relies on a constrained nonnegative matrix factorization that expresses the spatiotemporal fluorescence activity as the product of a spatial matrix that encodes the spatial footprint of each neuron in the optical field and a temporal matrix that characterizes the calcium concentration of each neuron over time. This framework is combined with a novel constrained deconvolution approach that extracts estimates of neural activity from fluorescence traces, to create a spatiotemporal processing algorithm that requires minimal parameter tuning. We demonstrate the general applicability of our method by applying it to in vitro and in vivo multi-neuronal imaging data, whole-brain light-sheet imaging data, and dendritic imaging data.

Show Abstract

Unimodal clustering using isotonic regression: ISO-SPLIT

J. Magland, A. Barnett

A limitation of many clustering algorithms is the requirement to tune adjustable parameters for each application or even for each dataset. Some techniques require an \emph{a priori} estimate of the number of clusters while density-based techniques usually require a scale parameter. Other parametric methods, such as mixture modeling, make assumptions about the underlying cluster distributions. Here we introduce a non-parametric clustering method that does not involve tunable parameters and only assumes that clusters are unimodal, in the sense that they have a single point of maximal density when projected onto any line, and that clusters are separated from one another by a separating hyperplane of relatively lower density. The technique uses a non-parametric variant of Hartigan's dip statistic using isotonic regression as the kernel operation repeated at every iteration. We compare the method against k-means++, DBSCAN, and Gaussian mixture methods and show in simulations that it performs better than these standard methods in many situations. The algorithm is suited for low-dimensional datasets with a large number of observations, and was motivated by the problem of "spike sorting" in neural electrical recordings. Source code is freely available.

Show Abstract

Primacy of Flexor Locomotor Pattern Revealed by Ancestral Reversion of Motor Neuron Identity

Timonty A Machado, E. Pnevmatikakis, Liam Paninski , Thomas M Jessell, Andrew Miri

Spinal circuits can generate locomotor output in the absence of sensory or descending input, but the principles of locomotor circuit organization remain unclear. We sought insight into these principles by considering the elaboration of locomotor circuits across evolution. The identity of limb-innervating motor neurons was reverted to a state resembling that of motor neurons that direct undulatory swimming in primitive aquatic vertebrates, permitting assessment of the role of motor neuron identity in determining locomotor pattern. Two-photon imaging was coupled with spike inference to measure locomotor firing in hundreds of motor neurons in isolated mouse spinal cords. In wild-type preparations, we observed sequential recruitment of motor neurons innervating flexor muscles controlling progressively more distal joints. Strikingly, after reversion of motor neuron identity, virtually all firing patterns became distinctly flexor like. Our findings show that motor neuron identity directs locomotor circuit wiring and indicate the evolutionary primacy of flexor pattern generation.

Show Abstract

Simple and efficient representations for the fundamental solutions of Stokes flow in a half-space

Zydrunas Gimbutas, L. Greengard, Shravan Veerapaneni

We derive new formulae for the fundamental solutions of slow viscous flow, governed by the Stokes equations, in a half-space. They are simpler than the classical representations obtained by Blake and collaborators, and can be efficiently implemented using existing fast solver libraries. We show, for example, that the velocity field induced by a Stokeslet can be annihilated on the boundary (to establish a zero-slip condition) using a single reflected Stokeslet combined with a single Papkovich–Neuber potential that involves only a scalar harmonic function. The new representation has a physically intuitive interpretation.

Show Abstract

Variable metric random pursuit

Sebastian U Stich , C. Müller, Bernd Gärtner

We consider unconstrained randomized optimization of smooth convex objective functions in the gradient-free setting. We analyze Random Pursuit (RP) algorithms with fixed (F-RP) and variable metric (V-RP). The algorithms only use zeroth-order information about the objective function and compute an approximate solution by repeated optimization over randomly chosen one-dimensional subspaces. The distribution of search directions is dictated by the chosen metric. Variable Metric RP uses novel variants of a randomized zeroth-order Hessian approximation scheme recently introduced by Leventhal and Lewis (D. Leventhal and A. S. Lewis., Optimization 60(3), 329--245, 2011). We here present (i) a refined analysis of the expected single step progress of RP algorithms and their global convergence on (strictly) convex functions and (ii) novel convergence bounds for V-RP on strongly convex functions. We also quantify how well the employed metric needs to match the local geometry of the function in order for the RP algorithms to converge with the best possible rate. Our theoretical results are accompanied by numerical experiments, comparing V-RP with the derivative-free schemes CMA-ES, Implicit Filtering, Nelder-Mead, NEWUOA, Pattern-Search and Nesterov's gradient-free algorithms.

Show Abstract

Inverse Obstacle Scattering in Two Dimensions with Multiple Frequency Data and Multiple Angles of Incidence

Carlos Borges, L. Greengard

We consider the problem of reconstructing the shape of an impenetrable sound-soft obstacle from scattering measurements. The input data is assumed to be the far-field pattern generated when a plane wave impinges on an unknown obstacle from one or more directions and at one or more frequencies. It is well known that this inverse scattering problem is both ill posed and nonlinear. It is common practice to overcome the ill posedness through the use of a penalty method or Tikhonov regularization. Here, we present a more physical regularization, based simply on restricting the unknown boundary to be band-limited in a suitable sense. To overcome the nonlinearity of the problem, we use a variant of Newton's method. When multiple frequency data is available, we supplement Newton's method with the recursive linearization approach due to Chen. During the course of solving the inverse problem, we need to compute the solution to a large number of forward scattering problems. For this, we use high-order accurate integral equation discretizations, coupled with fast direct solvers when the problem is sufficiently large.

Show Abstract

A Fast Direct Solver for High Frequency Scattering from a Large Cavity in Two Dimensions

Jun Lai, Sivaram Ambikasaran, L. Greengard

We present a fast direct solver for the simulation of electromagnetic scattering from an arbitrarily-shaped, large, empty cavity embedded in an infinite perfectly conducting half space. The governing Maxwell equations are reformulated as a well-conditioned second kind integral equation and the resulting linear system is solved in nearly linear time using a hierarchical matrix factorization technique. We illustrate the performance of the scheme with several numerical examples for complex cavity shapes over a wide range of frequencies.

Show Abstract