Publications

Fast multipole methods for evaluation of layer potentials with locally-corrected quadratures

L. Greengard, Michael O'Neil, M. Rachh, Felipe Vico

While fast multipole methods (FMMs) are in widespread use for the rapid evaluation of potential fields governed by the Laplace, Helmholtz, Maxwell or Stokes equations, their coupling to high-order quadratures for evaluating layer potentials is still an area of active research. In three dimensions, a number of issues need to be addressed, including the specification of the surface as the union of high-order patches, the incorporation of accurate quadrature rules for integrating singular or weakly singular Green's functions on such patches, and their coupling to the oct-tree data structures on which the FMM separates near and far field interactions. Although the latter is straightforward for point distributions, the near field for a patch is determined by its physical dimensions, not the distribution of discretization points on the surface. Here, we present a general framework for efficiently coupling locally corrected quadratures with FMMs, relying primarily on what are called generalized Gaussian quadratures rules, supplemented by adaptive integration. The approach, however, is quite general and easily applicable to other schemes, such as Quadrature by Expansion (QBX). We also introduce a number of accelerations to reduce the cost of quadrature generation itself, and present several numerical examples of acoustic scattering that demonstrate the accuracy, robustness, and computational efficiency of the scheme. On a single core of an Intel i5 2.3GHz processor, a Fortran implementation of the scheme can generate near field quadrature corrections for between 1000 and 10,000 points per second, depending on the order of accuracy and the desired precision. A Fortran implementation of the algorithm described in this work is available at https://gitlab.com/fastalgorithms/fmm3dbiethis https URL.

Show Abstract

ChIP-BIT2: a software tool to detect weak binding events using a Bayesian integration approach

X. Chen, A. Neuwald, L. Hilakivi-Clarke, R. Clarke, J. Xuan

Background
ChIP-seq combines chromatin immunoprecipitation assays with sequencing and identifies genome-wide binding sites for DNA binding proteins. While many binding sites have strong ChIP-seq ‘peak’ observations and are well captured, there are still regions bound by proteins weakly, with a relatively low ChIP-seq signal enrichment. These weak binding sites, especially those at promoters and enhancers, are functionally important because they also regulate nearby gene expression. Yet, it remains a challenge to accurately identify weak binding sites in ChIP-seq data due to the ambiguity in differentiating these weak binding sites from the amplified background DNAs.

Results
ChIP-BIT2 (http://sourceforge.net/projects/chipbitc/) is a software package for ChIP-seq peak detection. ChIP-BIT2 employs a mixture model integrating protein and control ChIP-seq data and predicts strong or weak protein binding sites at promoters, enhancers, or other genomic locations. For binding sites at gene promoters, ChIP-BIT2 simultaneously predicts their target genes. ChIP-BIT2 has been validated on benchmark regions and tested using large-scale ENCODE ChIP-seq data, demonstrating its high accuracy and wide applicability.

Conclusion
ChIP-BIT2 is an efficient ChIP-seq peak caller. It provides a better lens to examine weak binding sites and can refine or extend the existing binding site collection, providing additional regulatory regions for decoding the mechanism of gene expression regulation.

Show Abstract

Automated and scalable analysis pipelines for voltage imaging datasets

J. Friedrich, E. Pnevmatikakis, C. Cai, A. Singh, M. Hossein Eybposh, K. Podgorski, A. Giovannucci

Voltage imaging enables monitoring neural activity at sub-millisecond and sub-cellular scale, unlocking the study of subthreshold activity, synchrony, and network dynamics with unprecedented spatio-temporal resolution. However, high data rates (>800MB/s) and low signal-to-noise ratios create bottlenecks for analyzing such datasets. Here we present VolPy, an automated and scalable pipeline to pre-process voltage imaging datasets. VolPy features motion correction, memory mapping, automated segmentation, denoising and spike extraction, all built on a highly parallelizable, modular, and extensible framework optimized for memory and speed. To aid automated segmentation, we introduce a corpus of 24 manually annotated datasets from different preparations, brain areas and voltage indicators. We benchmark VolPy against ground truth segmentation, simulations and electrophysiology recordings, and we compare its performance with existing algorithms in detecting spikes. Our results indicate that VolPy’s performance in spike extraction and scalability are state-of-the-art.

Show Abstract

VolPy: automated and scalable 2 analysis pipelines for voltage 3 imaging datasets

J. Friedrich, C. Cai, E. Pnevmatikakis, K. Podgorski, A. Giovannucci

Voltage imaging enables monitoring neural activity at sub-millisecond and sub-cellular scale, unlocking the study of subthreshold activity, synchrony, and network dynamics with unprecedented spatio-temporal resolution. However, high data rates (>800MB/s) and low signal-to-noise ratios create bottlenecks for analyzing such datasets. Here we present VolPy, an automated and scalable pipeline to pre-process voltage imaging datasets. VolPy features motion correction, memory mapping, automated segmentation, denoising and spike extraction, all built on a highly parallelizable, modular, and extensible framework optimized for memory and speed. To aid automated segmentation, we introduce a corpus of 24 manually annotated datasets from different preparations, brain areas and voltage indicators. We benchmark VolPy against ground truth segmentation, simulations and electrophysiology recordings, and we compare its performance with existing algorithms in detecting spikes. Our results indicate that VolPy’s performance in spike extraction and scalability are state-of-the-art.

Show Abstract

Broadband Multi-wavelength Properties of M87 during the 2017 Event Horizon Telescope Campaign

The Event Horizon Telescope Collaboration, The Fermi Large Area Telescope Collaboration, H.E.S.S. Collaboration, MAGIC Collaboration, VERITAS Collaboration, EAVN Collaboration, J. C. Algaba, J. Anczarski, K. Asada, ..., B. Ripperda, et. al.

In 2017, the Event Horizon Telescope (EHT) Collaboration succeeded in capturing the first direct image of the center of the M87 galaxy. The asymmetric ring morphology and size are consistent with theoretical expectations for a weakly accreting supermassive black hole of mass approximately 6.5 x 10^9 M_solar. The EHTC also partnered with several international facilities in space and on the ground, to arrange an extensive, quasi-simultaneous multi-wavelength campaign. This Letter presents the results and analysis of this campaign, as well as the multi-wavelength data as a legacy data repository. We captured M87 in a historically low state, and the core flux dominates over HST-1 at high energies, making it possible to combine core flux constraints with the more spatially precise very long baseline interferometry data. We present the most complete simultaneous multi-wavelength spectrum of the active nucleus to date, and discuss the complexity and caveats of combining data from different spatial scales into one broadband spectrum. We apply two heuristic, isotropic leptonic single-zone models to provide insight into the basic source properties, but conclude that a structured jet is necessary to explain M87's spectrum. We can exclude that the simultaneous gamma-ray emission is produced via inverse Compton emission in the same region producing the EHT mm-band emission, and further conclude that the gamma-rays can only be produced in the inner jets (inward of HST-1) if there are strongly particle-dominated regions. Direct synchrotron emission from accelerated protons and secondaries cannot yet be excluded.

Show Abstract

Variational Monte Carlo calculations of A≤4 nuclei with an artificial neural-network correlator ansatz

Corey Adams, G. Carleo, Alessandro Lovato, Noemi Rocco

The complexity of many-body quantum wave functions is a central aspect of several fields of physics and chemistry where non-perturbative interactions are prominent. Artificial neural networks (ANNs) have proven to be a flexible tool to approximate quantum many-body states in condensed matter and chemistry problems. In this work we introduce a neural-network quantum state ansatz to model the ground-state wave function of light nuclei, and approximately solve the nuclear many-body Schrödinger equation. Using efficient stochastic sampling and optimization schemes, our approach extends pioneering applications of ANNs in the field, which present exponentially-scaling algorithmic complexity. We compute the binding energies and point-nucleon densities of A≤4 nuclei as emerging from a leading-order pionless effective field theory Hamiltonian. We successfully benchmark the ANN wave function against more conventional parametrizations based on two- and three-body Jastrow functions, and virtually-exact Green's function Monte Carlo results.

Show Abstract

Learning physically consistent differential equation models from data using group sparsity

Suryanarayana Maddu, Bevan L Cheeseman, Ivo F Sbalzarini, C. Müller

We propose a statistical learning framework based on group-sparse regression that can be used to 1) enforce conservation laws, 2) ensure model equivalence, and 3) guarantee symmetries when learning or inferring differential-equation models from measurement data. Directly learning interpretable mathematical models from data has emerged as a valuable modeling approach. However, in areas like biology, high noise levels, sensor-induced correlations, and strong inter-system variability can render data-driven models nonsensical or physically inconsistent without additional constraints on the model structure. Hence, it is important to leverage prior knowledge from physical principles to learn "biologically plausible and physically consistent" models rather than models that simply fit the data best. We present a novel group Iterative Hard Thresholding (gIHT) algorithm and use stability selection to infer physically consistent models with minimal parameter tuning. We show several applications from systems biology that demonstrate the benefits of enforcing priors in data-driven modeling.

Show Abstract

deepregression: a Flexible Neural Network Framework for Semi-Structured Deep Distributional Regression

David Rügamer, Ruolin Shen, Christina Bukas, Dominik Thalmeier, Nadja Klein, Chris Kolb, Florian Pfisterer, Philipp Kopper, Bernd Bischl, others, C. Müller

This paper describes the implementation of semi-structured deep distributional regression, a flexible framework to learn distributions based on a combination of additive regression models and deep neural networks. deepregression is implemented in both R and Python, using the deep learning libraries TensorFlow and PyTorch, respectively. The implementation consists of (1) a modular neural network building system for the combination of various statistical and deep learning approaches, (2) an orthogonalization cell to allow for an interpretable combination of different subnetworks as well as (3) pre-processing steps necessary to initialize such models. The software package allows to define models in a user-friendly manner using distribution definitions via a formula environment that is inspired by classical statistical model frameworks such as mgcv. The packages' modular design and functionality provides a unique resource for rapid and reproducible prototyping of complex statistical and deep learning models while simultaneously retaining the indispensable interpretability of classical statistical models.

Show Abstract

Classification of Magnetohydrodynamic Simulations Using Wavelet Scattering Transforms

Andrew K. Saydjari, Stephen K. N. Portillo, Zachary Slepian, ..., B. Burkart, et. al.

The complex interplay of magnetohydrodynamics, gravity, and supersonic turbulence in the interstellar medium (ISM) introduces non-Gaussian structure that can complicate comparison between theory and observation. We show that the Wavelet Scattering Transform (WST), in combination with linear discriminant analysis (LDA), is sensitive to non-Gaussian structure in 2D ISM dust maps. WST-LDA classifies magnetohydrodynamic (MHD) turbulence simulations with up to a 97\% true positive rate in our testbed of 8 simulations with varying sonic and Alfvénic Mach numbers. We present a side-by-side comparison with two other methods for non-Gaussian characterization, the Reduced Wavelet Scattering Transform (RWST) and the 3-Point Correlation Function (3PCF). We also demonstrate the 3D-WST-LDA and apply it to classification of density fields in position-position-velocity (PPV) space, where density correlations can be studied using velocity coherence as a proxy. WST-LDA is robust to common observational artifacts, such as striping and missing data, while also sensitive enough to extract the net magnetic field direction for sub-Alfvénic turbulent density fields. We include a brief analysis of the effect of point spread functions and image pixelization on 2D-WST-LDA applied to density fields, which informs the future goal of applying WST-LDA to 2D or 3D all-sky dust maps to extract hydrodynamic parameters of interest.

Show Abstract

Ensuring scientific reproducibility in bio-macromolecular modeling via extensive, automated benchmarks

J. Koehler, S. Lyskov, S. Lewis, J. Adolf-Bryfogle, R. Alford, K. Barlow, Z. Ben-Aharon, D. Farrell , J. Fell, W. Hansen, A. Harmalkar, J. Jeliazkov, G. Kuenze, J. Krys, A. Ljubetič, A. Loshbaugh, J. Maguire, R. Moretti, V. Mulligan, R. Bonneau, et al

Each year vast international resources are wasted on irreproducible research. The scientific community has been slow to adopt standard software engineering practices, despite the increases in high-dimensional data, complexities of workflows, and computational environments. Here we show how scientific software applications can be created in a reproducible manner when simple design goals for reproducibility are met. We describe the implementation of a test server framework and 40 scientific benchmarks, covering numerous applications in Rosetta bio-macromolecular modeling. High performance computing cluster integration allows these benchmarks to run continuously and automatically. Detailed protocol captures are useful for developers and users of Rosetta and other macromolecular modeling tools. The framework and design concepts presented here are valuable for developers and users of any type of scientific software and for the scientific community to create reproducible methods. Specific examples highlight the utility of this framework and the comprehensive documentation illustrates the ease of adding new tests in a matter of hours.

Show Abstract