2573 Publications

ChIP-BIT2: a software tool to detect weak binding events using a Bayesian integration approach

X. Chen, A. Neuwald, L. Hilakivi-Clarke, R. Clarke, J. Xuan

Background
ChIP-seq combines chromatin immunoprecipitation assays with sequencing and identifies genome-wide binding sites for DNA binding proteins. While many binding sites have strong ChIP-seq ‘peak’ observations and are well captured, there are still regions bound by proteins weakly, with a relatively low ChIP-seq signal enrichment. These weak binding sites, especially those at promoters and enhancers, are functionally important because they also regulate nearby gene expression. Yet, it remains a challenge to accurately identify weak binding sites in ChIP-seq data due to the ambiguity in differentiating these weak binding sites from the amplified background DNAs.

Results
ChIP-BIT2 (http://sourceforge.net/projects/chipbitc/) is a software package for ChIP-seq peak detection. ChIP-BIT2 employs a mixture model integrating protein and control ChIP-seq data and predicts strong or weak protein binding sites at promoters, enhancers, or other genomic locations. For binding sites at gene promoters, ChIP-BIT2 simultaneously predicts their target genes. ChIP-BIT2 has been validated on benchmark regions and tested using large-scale ENCODE ChIP-seq data, demonstrating its high accuracy and wide applicability.

Conclusion
ChIP-BIT2 is an efficient ChIP-seq peak caller. It provides a better lens to examine weak binding sites and can refine or extend the existing binding site collection, providing additional regulatory regions for decoding the mechanism of gene expression regulation.

Show Abstract
April 15, 2021

Automated and scalable analysis pipelines for voltage imaging datasets

J. Friedrich, E. Pnevmatikakis, C. Cai, A. Singh, M. Hossein Eybposh, K. Podgorski, A. Giovannucci

Voltage imaging enables monitoring neural activity at sub-millisecond and sub-cellular scale, unlocking the study of subthreshold activity, synchrony, and network dynamics with unprecedented spatio-temporal resolution. However, high data rates (>800MB/s) and low signal-to-noise ratios create bottlenecks for analyzing such datasets. Here we present VolPy, an automated and scalable pipeline to pre-process voltage imaging datasets. VolPy features motion correction, memory mapping, automated segmentation, denoising and spike extraction, all built on a highly parallelizable, modular, and extensible framework optimized for memory and speed. To aid automated segmentation, we introduce a corpus of 24 manually annotated datasets from different preparations, brain areas and voltage indicators. We benchmark VolPy against ground truth segmentation, simulations and electrophysiology recordings, and we compare its performance with existing algorithms in detecting spikes. Our results indicate that VolPy’s performance in spike extraction and scalability are state-of-the-art.

Show Abstract
2021

VolPy: automated and scalable 2 analysis pipelines for voltage 3 imaging datasets

J. Friedrich, C. Cai, E. Pnevmatikakis, K. Podgorski, A. Giovannucci

Voltage imaging enables monitoring neural activity at sub-millisecond and sub-cellular scale, unlocking the study of subthreshold activity, synchrony, and network dynamics with unprecedented spatio-temporal resolution. However, high data rates (>800MB/s) and low signal-to-noise ratios create bottlenecks for analyzing such datasets. Here we present VolPy, an automated and scalable pipeline to pre-process voltage imaging datasets. VolPy features motion correction, memory mapping, automated segmentation, denoising and spike extraction, all built on a highly parallelizable, modular, and extensible framework optimized for memory and speed. To aid automated segmentation, we introduce a corpus of 24 manually annotated datasets from different preparations, brain areas and voltage indicators. We benchmark VolPy against ground truth segmentation, simulations and electrophysiology recordings, and we compare its performance with existing algorithms in detecting spikes. Our results indicate that VolPy’s performance in spike extraction and scalability are state-of-the-art.

Show Abstract
2021

Broadband Multi-wavelength Properties of M87 during the 2017 Event Horizon Telescope Campaign

The Event Horizon Telescope Collaboration, The Fermi Large Area Telescope Collaboration, H.E.S.S. Collaboration, MAGIC Collaboration, VERITAS Collaboration, EAVN Collaboration, J. C. Algaba, J. Anczarski, K. Asada, ..., B. Ripperda, et. al.

In 2017, the Event Horizon Telescope (EHT) Collaboration succeeded in capturing the first direct image of the center of the M87 galaxy. The asymmetric ring morphology and size are consistent with theoretical expectations for a weakly accreting supermassive black hole of mass approximately 6.5 x 10^9 M_solar. The EHTC also partnered with several international facilities in space and on the ground, to arrange an extensive, quasi-simultaneous multi-wavelength campaign. This Letter presents the results and analysis of this campaign, as well as the multi-wavelength data as a legacy data repository. We captured M87 in a historically low state, and the core flux dominates over HST-1 at high energies, making it possible to combine core flux constraints with the more spatially precise very long baseline interferometry data. We present the most complete simultaneous multi-wavelength spectrum of the active nucleus to date, and discuss the complexity and caveats of combining data from different spatial scales into one broadband spectrum. We apply two heuristic, isotropic leptonic single-zone models to provide insight into the basic source properties, but conclude that a structured jet is necessary to explain M87's spectrum. We can exclude that the simultaneous gamma-ray emission is produced via inverse Compton emission in the same region producing the EHT mm-band emission, and further conclude that the gamma-rays can only be produced in the inner jets (inward of HST-1) if there are strongly particle-dominated regions. Direct synchrotron emission from accelerated protons and secondaries cannot yet be excluded.

Show Abstract

Variational Monte Carlo calculations of A≤4 nuclei with an artificial neural-network correlator ansatz

Corey Adams, G. Carleo, Alessandro Lovato, Noemi Rocco

The complexity of many-body quantum wave functions is a central aspect of several fields of physics and chemistry where non-perturbative interactions are prominent. Artificial neural networks (ANNs) have proven to be a flexible tool to approximate quantum many-body states in condensed matter and chemistry problems. In this work we introduce a neural-network quantum state ansatz to model the ground-state wave function of light nuclei, and approximately solve the nuclear many-body Schrödinger equation. Using efficient stochastic sampling and optimization schemes, our approach extends pioneering applications of ANNs in the field, which present exponentially-scaling algorithmic complexity. We compute the binding energies and point-nucleon densities of A≤4 nuclei as emerging from a leading-order pionless effective field theory Hamiltonian. We successfully benchmark the ANN wave function against more conventional parametrizations based on two- and three-body Jastrow functions, and virtually-exact Green's function Monte Carlo results.

Show Abstract

Learning physically consistent differential equation models from data using group sparsity

Suryanarayana Maddu, Bevan L Cheeseman, Ivo F Sbalzarini, C. Müller

We propose a statistical learning framework based on group-sparse regression that can be used to 1) enforce conservation laws, 2) ensure model equivalence, and 3) guarantee symmetries when learning or inferring differential-equation models from measurement data. Directly learning interpretable mathematical models from data has emerged as a valuable modeling approach. However, in areas like biology, high noise levels, sensor-induced correlations, and strong inter-system variability can render data-driven models nonsensical or physically inconsistent without additional constraints on the model structure. Hence, it is important to leverage prior knowledge from physical principles to learn "biologically plausible and physically consistent" models rather than models that simply fit the data best. We present a novel group Iterative Hard Thresholding (gIHT) algorithm and use stability selection to infer physically consistent models with minimal parameter tuning. We show several applications from systems biology that demonstrate the benefits of enforcing priors in data-driven modeling.

Show Abstract

deepregression: a Flexible Neural Network Framework for Semi-Structured Deep Distributional Regression

David Rügamer, Ruolin Shen, Christina Bukas, Dominik Thalmeier, Nadja Klein, Chris Kolb, Florian Pfisterer, Philipp Kopper, Bernd Bischl, others, C. Müller

This paper describes the implementation of semi-structured deep distributional regression, a flexible framework to learn distributions based on a combination of additive regression models and deep neural networks. deepregression is implemented in both R and Python, using the deep learning libraries TensorFlow and PyTorch, respectively. The implementation consists of (1) a modular neural network building system for the combination of various statistical and deep learning approaches, (2) an orthogonalization cell to allow for an interpretable combination of different subnetworks as well as (3) pre-processing steps necessary to initialize such models. The software package allows to define models in a user-friendly manner using distribution definitions via a formula environment that is inspired by classical statistical model frameworks such as mgcv. The packages' modular design and functionality provides a unique resource for rapid and reproducible prototyping of complex statistical and deep learning models while simultaneously retaining the indispensable interpretability of classical statistical models.

Show Abstract
April 6, 2021

Comment on “Stepped pressure profile equilibria in cylindrical plasmas via partial Taylor relaxation” [J. Plasma Physics (2006), vol. 72, part 6, pp. 1167–1171]

Yuanfan Wang, D. Malhotra, Antoine J. Cerfon

In an early study of the properties and capabilities of the multiregion, relaxed magnetohydrodynamic model, Hole, Hudson & Dewar claim that they are able to construct a multiregion stepped pressure cylindrical equilibrium which does not require the existence of surface currents. We present a brief argument showing that this claim is incorrect, and clarify the meaning of their statement. Furthermore, even with the statement clarified, we demonstrate that it is not possible to find solutions to reproduce the equilibrium corresponding to the parameters given in the article. We invite the authors to provide a corrigendum with the correct values of the equilibrium they constructed.

Show Abstract

Ensuring scientific reproducibility in bio-macromolecular modeling via extensive, automated benchmarks

J. Koehler, S. Lyskov, S. Lewis, J. Adolf-Bryfogle, R. Alford, K. Barlow, Z. Ben-Aharon, D. Farrell , J. Fell, W. Hansen, A. Harmalkar, J. Jeliazkov, G. Kuenze, J. Krys, A. Ljubetič, A. Loshbaugh, J. Maguire, R. Moretti, V. Mulligan, R. Bonneau, et al

Each year vast international resources are wasted on irreproducible research. The scientific community has been slow to adopt standard software engineering practices, despite the increases in high-dimensional data, complexities of workflows, and computational environments. Here we show how scientific software applications can be created in a reproducible manner when simple design goals for reproducibility are met. We describe the implementation of a test server framework and 40 scientific benchmarks, covering numerous applications in Rosetta bio-macromolecular modeling. High performance computing cluster integration allows these benchmarks to run continuously and automatically. Detailed protocol captures are useful for developers and users of Rosetta and other macromolecular modeling tools. The framework and design concepts presented here are valuable for developers and users of any type of scientific software and for the scientific community to create reproducible methods. Specific examples highlight the utility of this framework and the comprehensive documentation illustrates the ease of adding new tests in a matter of hours.

Show Abstract

Classification of Magnetohydrodynamic Simulations Using Wavelet Scattering Transforms

Andrew K. Saydjari, Stephen K. N. Portillo, Zachary Slepian, ..., B. Burkart, et. al.

The complex interplay of magnetohydrodynamics, gravity, and supersonic turbulence in the interstellar medium (ISM) introduces non-Gaussian structure that can complicate comparison between theory and observation. We show that the Wavelet Scattering Transform (WST), in combination with linear discriminant analysis (LDA), is sensitive to non-Gaussian structure in 2D ISM dust maps. WST-LDA classifies magnetohydrodynamic (MHD) turbulence simulations with up to a 97\% true positive rate in our testbed of 8 simulations with varying sonic and Alfvénic Mach numbers. We present a side-by-side comparison with two other methods for non-Gaussian characterization, the Reduced Wavelet Scattering Transform (RWST) and the 3-Point Correlation Function (3PCF). We also demonstrate the 3D-WST-LDA and apply it to classification of density fields in position-position-velocity (PPV) space, where density correlations can be studied using velocity coherence as a proxy. WST-LDA is robust to common observational artifacts, such as striping and missing data, while also sensitive enough to extract the net magnetic field direction for sub-Alfvénic turbulent density fields. We include a brief analysis of the effect of point spread functions and image pixelization on 2D-WST-LDA applied to density fields, which informs the future goal of applying WST-LDA to 2D or 3D all-sky dust maps to extract hydrodynamic parameters of interest.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates

privacy consent banner

Privacy preference

We use cookies to provide you with the best online experience. By clicking "Accept All," you help us understand how our site is used and enhance its performance. You can change your choice at any time here. To learn more, please visit our Privacy Policy.