CCM: Publications

Extended playing techniques: the next milestone in musical instrument recognition

Vincent Lostanlen, J. Andén, Mathieu Lagrange

The expressive variability in producing a musical note conveys information essential to the modeling of orchestration and style. As such, it plays a crucial role in computer-assisted browsing of massive digital music corpora. Yet, although the automatic recognition of a musical instrument from the recording of a single "ordinary" note is considered a solved problem, automatic identification of instrumental playing technique (IPT) remains largely underdeveloped. We benchmark machine listening systems for query-by-example browsing among 143 extended IPTs for 16 instruments, amounting to 469 triplets of instrument, mute, and technique. We identify and discuss three necessary conditions for significantly outperforming the traditional mel-frequency cepstral coefficient (MFCC) baseline: the addition of second-order scattering coefficients to account for amplitude modulation, the incorporation of long-range temporal dependencies, and metric learning using large-margin nearest neighbors (LMNN) to reduce intra-class variability. Evaluating on the Studio On Line (SOL) dataset, we obtain a precision at rank 5 of 99.7\% for instrument recognition (baseline at 89.0\%) and of 61.0\% for IPT recognition (baseline at 44.5\%). We interpret this gain through a qualitative assessment of practical usability and visualization using nonlinear dimensionality reduction.

Show Abstract

A unified integral equation scheme for doubly-periodic Laplace and Stokes boundary value problems in two dimensions

A. Barnett, G Marple, S. Veerapaneni, L Zhao

We present a spectrally-accurate scheme to turn a boundary integral formulation for an elliptic PDE on a single unit cell geometry into one for the fully periodic problem. Applications include computing the effective permeability of composite media (homogenization), and microfluidic chip design. Our basic idea is to exploit a small least squares solve to apply periodicity without ever handling periodic Green's functions. We exhibit fast solvers for the two-dimensional (2D) doubly-periodic Neumann Laplace problem (flow around insulators), and Stokes non-slip fluid flow problem, that for inclusions with smooth boundaries achieve 12-digit accuracy, and can handle thousands of inclusions per unit cell. We split the infinite sum over the lattice of images into a directly-summed "near" part plus a small number of auxiliary sources which represent the (smooth) remaining "far" contribution. Applying physical boundary conditions on the unit cell walls gives an expanded linear system, which, after a rank-1 or rank-3 correction and a Schur complement, leaves a well-conditioned square system which can be solved iteratively using fast multipole acceleration plus a low-rank term. We are rather explicit about the consistency and nullspaces of both the continuous and discretized problems. The scheme is simple (no lattice sums, Ewald methods, nor particle meshes are required), allows adaptivity, and is essentially dimension- and PDE-independent, so would generalize without fuss to 3D and to other non-oscillatory elliptic problems such as elastostatics. We incorporate recently developed spectral quadratures that accurately handle close-to-touching geometries. We include many numerical examples, and provide a software implementation.

Show Abstract

Structural Variability from Noisy Tomographic Projections

J. Andén, A. Singer

In cryo-electron microscopy, the three-dimensional (3D) electric potentials of an ensemble of molecules are projected along arbitrary viewing directions to yield noisy two-dimensional images. The volume maps representing these potentials typically exhibit a great deal of structural variability, which is described by their 3D covariance matrix. Typically, this covariance matrix is approximately low rank and can be used to cluster the volumes or estimate the intrinsic geometry of the conformation space. We formulate the estimation of this covariance matrix as a linear inverse problem, yielding a consistent least-squares estimator. For $n$ images of size N-by-N pixels, we propose an algorithm for calculating this covariance estimator with computational complexity O(n * N^4+√κ * N^6 * log(N)), where the condition number κ is empirically in the range 10-200. Its efficiency relies on the observation that the normal equations are equivalent to a deconvolution problem in six dimensions. This is then solved by the conjugate gradient method with an appropriate circulant preconditioner. The result is the first computationally efficient algorithm for consistent estimation of the 3D covariance from noisy projections. It also compares favorably in runtime with respect to previously proposed nonconsistent estimators. Motivated by the recent success of eigenvalue shrinkage procedures for high-dimensional covariance matrix estimation, we incorporate a shrinkage procedure that improves accuracy at lower signal-to-noise ratios. We evaluate our methods on simulated datasets and achieve classification results comparable to state-of-the-art methods in shorter running time. We also present results on clustering volumes in an experimental dataset, illustrating the power of the proposed algorithm for practical determination of structural variability.

Show Abstract

Integral equation methods for electrostatics, acoustics and electromagnetics in smoothly varying, anisotropic media

Lise-Marie Imbert-Gerard, Felipe Vico, L. Greengard, Miguel Ferrando

We present a collection of well-conditioned integral equation methods for the solution of electrostatic, acoustic or electromagnetic scattering problems involving anisotropic, inhomogeneous media. In the electromagnetic case, our approach involves a minor modification of a classical formulation. In the electrostatic or acoustic setting, we introduce a new vector partial differential equation, from which the desired solution is easily obtained. It is the vector equation for which we derive a well-conditioned integral equation. In addition to providing a unified framework for these solvers, we illustrate their performance using iterative solution methods coupled with the FFT-based technique of [1] to discretize and apply the relevant integral operators.

Show Abstract

Prediction error bounds for linear regression with the TREX

Jacob Bien, Irina Gaynanova, Johannes Lederer, C. Müller

The TREX is a recently introduced approach to sparse linear regression. In contrast to most well-known approaches to penalized regression, the TREX can be formulated without the use of tuning parameters. In this paper, we establish the first known prediction error bounds for the TREX. Additionally, we introduce extensions of the TREX to a more general class of penalties, and we provide a bound on the prediction error in this generalized setting. These results deepen the understanding of the TREX from a theoretical perspective and provide new insights into penalized regression in general.

Show Abstract

An adaptive fast Gauss transform in two dimensions

J. Wang, L. Greengard

A variety of problems in computational physics and engineering require the convolution of the heat kernel (a Gaussian) with either discrete sources, densities supported on boundaries, or continuous volume distributions. We present a unified fast Gauss transform for this purpose in two dimensions, making use of an adaptive quad-tree discretization on a unit square which is assumed to contain all sources. Our implementation permits either free-space or periodic boundary conditions to be imposed, and is efficient for any choice of variance in the Gaussian.

Show Abstract

Decoupled field integral equations for electromagnetic scattering from homogeneous penetrable obstacles

Felipe Vico, L. Greengard, Miguel Ferrando

We present a new method for the analysis of electromagnetic scattering from homogeneous penetrable bodies. Our approach is based on a reformulation of the governing Maxwell equations in terms of two uncoupled vector Helmholtz systems: one for the electric feld and one for the magnetic field. This permits the derivation of resonance-free Fredholm equations of the second kind that are stable at all frequencies, insensitive to the genus of the scatterers, and invertible for all passive materials including those with negative permittivities or permeabilities. We refer to these as decoupled field integral equations.

Show Abstract

Temporal probabilistic modeling of bacterial compositions derived from 16S rRNA sequencing

Tarmo Äijö, C. Müller, R. Bonneau

The number of microbial and metagenomic studies has increased drastically due to advancements in next-generation sequencing-based measurement techniques. Statistical analysis and the validity of conclusions drawn from (time series) 16S rRNA and other metagenomic sequencing data is hampered by the presence of significant amount of noise and missing data (sampling zeros). Accounting uncertainty in microbiome data is often challenging due to the difficulty of obtaining biological replicates. Additionally, the compositional nature of current amplicon and metagenomic data differs from many other biological data types adding another challenge to the data analysis. To address these challenges in human microbiome research, we introduce a novel probabilistic approach to explicitly model overdispersion and sampling zeros by considering the temporal correlation between nearby time points using Gaussian Processes. The proposed Temporal Gaussian Process Model for Compositional Data Analysis (TGP-CODA) shows superior modeling performance compared to commonly used Dirichlet-multinomial, multinomial, and non-parametric regression models on real and synthetic data. We demonstrate that the nonreplicative nature of human gut microbiota studies can be partially overcome by our method with proper experimental design of dense temporal sampling. We also show that different modeling approaches have a strong impact on ecological interpretation of the data, such as stationarity, persistence, and environmental noise models. A Stan implementation of the proposed method is available under MIT license at

Show Abstract

APPLE Picker: Automatic Particle Picking, a Low-Effort Cryo-EM Framework

Ayelet Heimowitz, J. Andén, A. Singer

Particle picking is a crucial first step in the computational pipeline of single-particle cryo-electron microscopy (cryo-EM). Selecting particles from the micrographs is difficult especially for small particles with low contrast. As high-resolution reconstruction typically requires hundreds of thousands of particles, manually picking that many particles is often too time-consuming. While semi-automated particle picking is currently a popular approach, it may suffer from introducing manual bias into the selection process. In addition, semi-automated particle picking is still somewhat time-consuming. This paper presents the APPLE (Automatic Particle Picking with Low user Effort) picker, a simple and novel approach for fast, accurate, and fully automatic particle picking. While our approach was inspired by template matching, it is completely template-free. This approach is evaluated on publicly available datasets containing micrographs of β-galactosidase and keyhole limpet hemocyanin projections.

Show Abstract

Non-convex Global Minimization and False Discovery Rate Control for the TREX

J. Bien, Irina Gaynanova, Johannes Lederer, C. Müller

The TREX is a recently introduced method for performing sparse high-dimensional regression. Despite its statistical promise as an alternative to the lasso, square-root lasso, and scaled lasso, the TREX is computationally challenging in that it requires solving a nonconvex optimization problem. This article shows a remarkable result: despite the nonconvexity of the TREX problem, there exists a polynomial-time algorithm that is guaranteed to find the global minimum. This result adds the TREX to a very short list of nonconvex optimization problems that can be globally optimized (principal components analysis being a famous example). After deriving and developing this new approach, we demonstrate that (i) the ability of the preexisting TREX heuristic to reach the global minimum is strongly dependent on the difficulty of the underlying statistical problem, (ii) the new polynomial-time algorithm for TREX permits a novel variable ranking and selection scheme, (iii) this scheme can be incorporated into a rule that controls the false discovery rate (FDR) of included features in the model. To achieve this last aim, we provide an extension of the results of Barber and Candes to establish that the knockoff filter framework can be applied to the TREX. This investigation thus provides both a rare case study of a heuristic for nonconvex optimization and a novel way of exploiting nonconvexity for statistical inference.

Show Abstract