CCM: Publications

Compressed sensing and optimal denoising of monotone signals

We consider the problems of compressed sensing and optimal denoising for signals $\mathbf{x_0}\in\mathbb{R}^N$ that are monotone, i.e., $\mathbf{x_0}(i+1) \geq \mathbf{x_0}(i)$, and sparsely varying, i.e., $\mathbf{x_0}(i+1) > \mathbf{x_0}(i)$ only for a small number $k$ of indices $i$. We approach the compressed sensing problem by minimizing the total variation norm restricted to the class of monotone signals subject to equality constraints obtained from a number of measurements $A\mathbf{x_0}$. For random Gaussian sensing matrices $A\in\mathbb{R}^{m\times N}$ we derive a closed form expression for the number of measurements $m$ required for successful reconstruction with high probability. We show that the probability undergoes a phase transition as $m$ varies, and depends not only on the number of change points, but also on their location. For denoising we regularize with the same norm and derive a formula for the optimal regularizer weight that depends only mildly on $\mathbf{x_0}$. We obtain our results using the statistical dimension tool.

Show Abstract

Fused regression for multi-source gene regulatory network inference

K Lam, Z Westrick, C. Müller, L Christiaen, R. Bonneau

Understanding gene regulatory networks is critical to understanding cellular differentiation and response to external stimuli. Methods for global network inference have been developed and applied to a variety of species. Most approaches consider the problem of network inference independently in each species, despite evidence that gene regulation can be conserved even in distantly related species. Further, network inference is often confined to single data-types (single platforms) and single cell types. We introduce a method for multi-source network inference that allows simultaneous estimation of gene regulatory networks in multiple species or biological processes through the introduction of priors based on known gene relationships such as orthology incorporated using fused regression. This approach improves network inference performance even when orthology mapping and conservation are incomplete. We refine this method by presenting an algorithm that extracts the true conserved subnetwork from a larger set of potentially conserved interactions and demonstrate the utility of our method in cross species network inference. Last, we demonstrate our method’s utility in learning from data collected on different experimental platforms.

Show Abstract

Fast convolution with free-space Green’s functions

Felipe Vico, L. Greengard, Miguel Ferrando

We introduce a fast algorithm for computing volume potentials - that is, the convolution of a translation invariant, free-space Green's function with a compactly supported source distribution defined on a uniform grid. The algorithm relies on regularizing the Fourier transform of the Green's function by cutting off the interaction in physical space beyond the domain of interest. This permits the straightforward application of trapezoidal quadrature and the standard FFT, with superalgebraic convergence for smooth data. Moreover, the method can be interpreted as employing a Nystrom discretization of the corresponding integral operator, with matrix entries which can be obtained explicitly and rapidly. This is of use in the design of preconditioners or fast direct solvers for a variety of volume integral equations. The method proposed permits the computation of any derivative of the potential, at the cost of an additional FFT.

Show Abstract

Integral Equation Methods for Elastance and Mobility Problems in Two Dimensions

M. Rachh, L. Greengard

We present new integral representations in two dimensions for the elastance problem in electrostatics and the mobility problem in Stokes flow. These representations lead to resonance-free Fredholm integral equations of the second kind and well conditioned linear systems upon discretization. By coupling our integral equations with high order quadrature and fast multipole acceleration, large-scale problems can be solved with only modest computing resources. We also discuss some applications of these boundary value problems in applied physics.

Show Abstract

On the stability of time-domain integral equations for acoustic wave propagation

Charles L. Epstein, L. Greengard, Thomas Hagstrom

We give a principled approach for the selection of a boundary integral, retarded potential representation for the solution of scattering problems for the wave equation in an exterior domain.

Show Abstract

Fast, adaptive, high order accurate discretization of the Lippmann-Schwinger equation in two dimension

Sivaram Ambikasaran, Carlos Borges, Lise-Marie Imbert-Gerard, L. Greengard

We present a fast direct solver for two dimensional scattering problems, where an incident wave impinges on a penetrable medium with compact support. We represent the scattered field using a volume potential whose kernel is the outgoing Green's function for the exterior domain. Inserting this representation into the governing partial differential equation, we obtain an integral equation of the Lippmann-Schwinger type. The principal contribution here is the development of an automatically adaptive, high-order accurate discretization based on a quad tree data structure which provides rapid access to arbitrary elements of the discretized system matrix. This permits the straightforward application of state-of-the-art algorithms for constructing compressed versions of the solution operator. These solvers typically require $O(N^{3/2})$ work, where $N$ denotes the number of degrees of freedom. We demonstrate the performance of the method for a variety of problems in both the low and high frequency regimes.

Show Abstract

Generalized Stability Approach for Regularized Graphical Models

C. Müller, R. Bonneau, Z Kurtz

Selecting regularization parameters in penalized high-dimensional graphical models in a principled, data-driven, and computationally efficient manner continues to be one of the key challenges in high-dimensional statistics. We present substantial computational gains and conceptual generalizations of the Stability Approach to Regularization Selection (StARS), a state-of-the-art graphical model selection scheme. Using properties of the Poisson-Binomial distribution and convex non-asymptotic distributional modeling we propose lower and upper bounds on the StARS graph regularization path which results in greatly reduced computational cost without compromising regularization selection. We also generalize the StARS criterion from single edge to induced subgraph (graphlet) stability. We show that simultaneously requiring edge and graphlet stability leads to superior graph recovery performance independent of graph topology. These novel insights render Gaussian graphical model selection a routine task on standard multi-core computers.

Show Abstract

Population-Level Representation of a Temporal Sequence Underlying Song Production in the Zebra Finch

Michel A. Picardo, Josh Merel, Kalman A. Katlowitz , Daniela Vallentin , Daniel E. Okobi, Sam E. Benezra, Rachel C. Clary, E. Pnevmatikakis, Liam Paninski, Michael A. Long

The zebra finch brain features a set of clearly defined and hierarchically arranged motor nuclei that are selectively responsible for producing singing behavior. One of these regions, a critical forebrain structure called HVC, contains premotor neurons that are active at precise time points during song production. However, the neural representation of this behavior at a population level remains elusive. We used two-photon microscopy to monitor ensemble activity during singing, integrating across multiple trials by adopting a Bayesian inference approach to more precisely estimate burst timing. Additionally, we examined spiking and motor-related synaptic inputs using intracellular recordings during singing. With both experimental approaches, we find that premotor events do not occur preferentially at the onsets or offsets of song syllables or at specific subsyllabic motor landmarks. These results strongly support the notion that HVC projection neurons collectively exhibit a temporal sequence during singing that is uncoupled from ongoing movements.

Show Abstract

Validation of neural spike sorting algorithms without ground-truth information

A. Barnett, J. Magland, L. Greengard

Background

The throughput of electrophysiological recording is growing rapidly, allowing thousands of simultaneous channels, and there is a growing variety of spike sorting algorithms designed to extract neural firing events from such data. This creates an urgent need for standardized, automatic evaluation of the quality of neural units output by such algorithms.

New method

We introduce a suite of validation metrics that assess the credibility of a given automatic spike sorting algorithm applied to a given dataset. By rerunning the spike sorter two or more times, the metrics measure stability under various perturbations consistent with variations in the data itself, making no assumptions about the internal workings of the algorithm, and minimal assumptions about the noise.

Results

We illustrate the new metrics on standard sorting algorithms applied to both in vivo and ex vivo recordings, including a time series with overlapping spikes. We compare the metrics to existing quality measures, and to ground-truth accuracy in simulated time series. We provide a software implementation.
Comparison with existing methods

Metrics have until now relied on ground-truth, simulated data, internal algorithm variables (e.g. cluster separation), or refractory violations. By contrast, by standardizing the interface, our metrics assess the reliability of any automatic algorithm without reference to internal variables (e.g. feature space) or physiological criteria.
Conclusions

Stability is a prerequisite for reproducibility of results. Such metrics could reduce the significant human labor currently spent on validation, and should form an essential part of large-scale automated spike sorting and systematic benchmarking of algorithms.

Show Abstract

4C-ker: a method to reproducibly identify genome-wide interactions captured by 4C-Seq experiments

R Raviram, P Rocha, C. Müller, E. Miraldi, S Badri, Y Fu, E Swanzey, C Proudhon, V Snetkova, R. Bonneau, J Skok

4C-Seq has proven to be a powerful technique to identify genome-wide interactions with a single locus of interest (or “bait”) that can be important for gene regulation. However, analysis of 4C-Seq data is complicated by the many biases inherent to the technique. An important consideration when dealing with 4C-Seq data is the differences in resolution of signal across the genome that result from differences in 3D distance separation from the bait. This leads to the highest signal in the region immediately surrounding the bait and increasingly lower signals in far-cis and trans. Another important aspect of 4C-Seq experiments is the resolution, which is greatly influenced by the choice of restriction enzyme and the frequency at which it can cut the genome. Thus, it is important that a 4C-Seq analysis method is flexible enough to analyze data generated using different enzymes and to identify interactions across the entire genome. Current methods for 4C-Seq analysis only identify interactions in regions near the bait or in regions located in far-cis and trans, but no method comprehensively analyzes 4C signals of different length scales. In addition, some methods also fail in experiments where chromatin fragments are generated using frequent cutter restriction enzymes. Here, we describe 4C-ker, a Hidden-Markov Model based pipeline that identifies regions throughout the genome that interact with the 4C bait locus. In addition, we incorporate methods for the identification of differential interactions in multiple 4C-seq datasets collected from different genotypes or experimental conditions. Adaptive window sizes are used to correct for differences in signal coverage in near-bait regions, far-cis and trans chromosomes. Using several datasets, we demonstrate that 4C-ker outperforms all existing 4C-Seq pipelines in its ability to reproducibly identify interaction domains at all genomic ranges with different resolution enzymes.

Show Abstract