2697 Publications

Modeling transcriptional regulation of model species with deep learning

E. Cofer, A. Wong, O. Troyanskaya, et al.

To enable large-scale analyses of regulatory logic in model species, we developed DeepArk, a set of deep learning models of the cis-regulatory codes of four widely-studied species: Caenorhabditis elegans, Danio rerio, Drosophila melanogaster, and Mus musculus DeepArk accurately predicts the presence of thousands of different context-specific regulatory features, including chromatin states, histone marks, and transcription factors. In vivo studies show that DeepArk can predict the regulatory impact of any genomic variant (including rare or not previously observed), and enables the regulatory annotation of understudied model species.

Show Abstract
April 19, 2021

Molecular mechanisms underlying cellular effects of human MEK1 mutations

R. Marmion, L. Yang, Y. Goyal, G. Jindal, J. Wetzel, M. Singh, T. Schüpbach, S. Shvartsman

Terminal regions of Drosophila embryos are patterned by signaling through ERK, which is genetically deregulated in multiple human diseases. Quantitative studies of terminal patterning have been recently used to investigate gain-of-function variants of human MEK1, encoding the MEK kinase that directly activates ERK by dual phosphorylation. Unexpectedly, several mutations reduced ERK activation by extracellular signals, possibly through a negative feedback triggered by signal-independent activity of the mutant variants. Here we present experimental evidence supporting this model. Using a MEK variant that combines a mutation within the negative regulatory region with alanine substitutions in the activation loop, we prove that pathogenic variants indeed acquire signal-independent kinase activity. We also demonstrate that signal-dependent activation of these variants is independent of kinase suppressor of Ras, a conserved adaptor that is indispensable for activation of normal MEK. Finally, we show that attenuation of ERK activation by extracellular signals stems from transcriptional induction of Mkp3, a dual specificity phosphatase that deactivates ERK by dephosphorylation. These findings in the Drosophila embryo highlight its power for investigating diverse effects of human disease mutations.

Show Abstract

The Galaxy Progenitors of Stellar Streams around Milky Way-mass Galaxies in the FIRE Cosmological Simulations

N. Panithanpaisal, R. Sanderson, A. Wetzel, E. Cunningham, J. Bailin, C-A. Faucher-Giguère

Stellar streams record the accretion history of their host galaxy. We present a set of simulated streams from disrupted dwarf galaxies in 13 cosmological simulations of Milky Way (MW)-mass galaxies from the FIRE-2 suite at z=0, including 7 isolated Milky Way-mass systems and 6 hosts resembling the MW-M31 pair (full dataset at: this https URL). In total, we identify 106 simulated stellar streams, with no significant differences in the number of streams and masses of their progenitors between the isolated and paired environments. We resolve simulated streams with stellar masses ranging from ∼5×105 up to ∼109M⊙, similar to the mass range between the Orphan and Sagittarius streams in the MW. We confirm that present-day simulated satellite galaxies are good proxies for stellar stream progenitors, with similar properties including their stellar mass function, velocity dispersion, [Fe/H] and [α/H] evolution tracks, and orbital distribution with respect to the galactic disk plane. Each progenitor's lifetime is marked by several important timescales: its infall, star-formation quenching, and stream-formation times. We show that the ordering of these timescales is different between progenitors with stellar masses higher and lower than ∼2×106M⊙. Finally, we show that the main factor controlling the rate of phase-mixing, and therefore fading, of tidal streams from satellite galaxies in MW-mass hosts is non-adiabatic evolution of the host potential. Other factors commonly used to predict phase-mixing timescales, such as progenitor mass and orbital circularity, show virtually no correlation with the number of dynamical times required for a stream to become phase-mixed.

Show Abstract
April 19, 2021

Fast multipole methods for evaluation of layer potentials with locally-corrected quadratures

L. Greengard, Michael O'Neil, M. Rachh, Felipe Vico

While fast multipole methods (FMMs) are in widespread use for the rapid evaluation of potential fields governed by the Laplace, Helmholtz, Maxwell or Stokes equations, their coupling to high-order quadratures for evaluating layer potentials is still an area of active research. In three dimensions, a number of issues need to be addressed, including the specification of the surface as the union of high-order patches, the incorporation of accurate quadrature rules for integrating singular or weakly singular Green's functions on such patches, and their coupling to the oct-tree data structures on which the FMM separates near and far field interactions. Although the latter is straightforward for point distributions, the near field for a patch is determined by its physical dimensions, not the distribution of discretization points on the surface. Here, we present a general framework for efficiently coupling locally corrected quadratures with FMMs, relying primarily on what are called generalized Gaussian quadratures rules, supplemented by adaptive integration. The approach, however, is quite general and easily applicable to other schemes, such as Quadrature by Expansion (QBX). We also introduce a number of accelerations to reduce the cost of quadrature generation itself, and present several numerical examples of acoustic scattering that demonstrate the accuracy, robustness, and computational efficiency of the scheme. On a single core of an Intel i5 2.3GHz processor, a Fortran implementation of the scheme can generate near field quadrature corrections for between 1000 and 10,000 points per second, depending on the order of accuracy and the desired precision. A Fortran implementation of the algorithm described in this work is available at https://gitlab.com/fastalgorithms/fmm3dbiethis https URL.

Show Abstract

ChIP-BIT2: a software tool to detect weak binding events using a Bayesian integration approach

X. Chen, A. Neuwald, L. Hilakivi-Clarke, R. Clarke, J. Xuan

Background
ChIP-seq combines chromatin immunoprecipitation assays with sequencing and identifies genome-wide binding sites for DNA binding proteins. While many binding sites have strong ChIP-seq ‘peak’ observations and are well captured, there are still regions bound by proteins weakly, with a relatively low ChIP-seq signal enrichment. These weak binding sites, especially those at promoters and enhancers, are functionally important because they also regulate nearby gene expression. Yet, it remains a challenge to accurately identify weak binding sites in ChIP-seq data due to the ambiguity in differentiating these weak binding sites from the amplified background DNAs.

Results
ChIP-BIT2 (http://sourceforge.net/projects/chipbitc/) is a software package for ChIP-seq peak detection. ChIP-BIT2 employs a mixture model integrating protein and control ChIP-seq data and predicts strong or weak protein binding sites at promoters, enhancers, or other genomic locations. For binding sites at gene promoters, ChIP-BIT2 simultaneously predicts their target genes. ChIP-BIT2 has been validated on benchmark regions and tested using large-scale ENCODE ChIP-seq data, demonstrating its high accuracy and wide applicability.

Conclusion
ChIP-BIT2 is an efficient ChIP-seq peak caller. It provides a better lens to examine weak binding sites and can refine or extend the existing binding site collection, providing additional regulatory regions for decoding the mechanism of gene expression regulation.

Show Abstract
April 15, 2021

Automated and scalable analysis pipelines for voltage imaging datasets

J. Friedrich, E. Pnevmatikakis, C. Cai, A. Singh, M. Hossein Eybposh, K. Podgorski, A. Giovannucci

Voltage imaging enables monitoring neural activity at sub-millisecond and sub-cellular scale, unlocking the study of subthreshold activity, synchrony, and network dynamics with unprecedented spatio-temporal resolution. However, high data rates (>800MB/s) and low signal-to-noise ratios create bottlenecks for analyzing such datasets. Here we present VolPy, an automated and scalable pipeline to pre-process voltage imaging datasets. VolPy features motion correction, memory mapping, automated segmentation, denoising and spike extraction, all built on a highly parallelizable, modular, and extensible framework optimized for memory and speed. To aid automated segmentation, we introduce a corpus of 24 manually annotated datasets from different preparations, brain areas and voltage indicators. We benchmark VolPy against ground truth segmentation, simulations and electrophysiology recordings, and we compare its performance with existing algorithms in detecting spikes. Our results indicate that VolPy’s performance in spike extraction and scalability are state-of-the-art.

Show Abstract
2021

VolPy: automated and scalable 2 analysis pipelines for voltage 3 imaging datasets

J. Friedrich, C. Cai, E. Pnevmatikakis, K. Podgorski, A. Giovannucci

Voltage imaging enables monitoring neural activity at sub-millisecond and sub-cellular scale, unlocking the study of subthreshold activity, synchrony, and network dynamics with unprecedented spatio-temporal resolution. However, high data rates (>800MB/s) and low signal-to-noise ratios create bottlenecks for analyzing such datasets. Here we present VolPy, an automated and scalable pipeline to pre-process voltage imaging datasets. VolPy features motion correction, memory mapping, automated segmentation, denoising and spike extraction, all built on a highly parallelizable, modular, and extensible framework optimized for memory and speed. To aid automated segmentation, we introduce a corpus of 24 manually annotated datasets from different preparations, brain areas and voltage indicators. We benchmark VolPy against ground truth segmentation, simulations and electrophysiology recordings, and we compare its performance with existing algorithms in detecting spikes. Our results indicate that VolPy’s performance in spike extraction and scalability are state-of-the-art.

Show Abstract
2021

Broadband Multi-wavelength Properties of M87 during the 2017 Event Horizon Telescope Campaign

The Event Horizon Telescope Collaboration, The Fermi Large Area Telescope Collaboration, H.E.S.S. Collaboration, MAGIC Collaboration, VERITAS Collaboration, EAVN Collaboration, J. C. Algaba, J. Anczarski, K. Asada, ..., B. Ripperda, et. al.

In 2017, the Event Horizon Telescope (EHT) Collaboration succeeded in capturing the first direct image of the center of the M87 galaxy. The asymmetric ring morphology and size are consistent with theoretical expectations for a weakly accreting supermassive black hole of mass approximately 6.5 x 10^9 M_solar. The EHTC also partnered with several international facilities in space and on the ground, to arrange an extensive, quasi-simultaneous multi-wavelength campaign. This Letter presents the results and analysis of this campaign, as well as the multi-wavelength data as a legacy data repository. We captured M87 in a historically low state, and the core flux dominates over HST-1 at high energies, making it possible to combine core flux constraints with the more spatially precise very long baseline interferometry data. We present the most complete simultaneous multi-wavelength spectrum of the active nucleus to date, and discuss the complexity and caveats of combining data from different spatial scales into one broadband spectrum. We apply two heuristic, isotropic leptonic single-zone models to provide insight into the basic source properties, but conclude that a structured jet is necessary to explain M87's spectrum. We can exclude that the simultaneous gamma-ray emission is produced via inverse Compton emission in the same region producing the EHT mm-band emission, and further conclude that the gamma-rays can only be produced in the inner jets (inward of HST-1) if there are strongly particle-dominated regions. Direct synchrotron emission from accelerated protons and secondaries cannot yet be excluded.

Show Abstract

Variational Monte Carlo calculations of A≤4 nuclei with an artificial neural-network correlator ansatz

Corey Adams, G. Carleo, Alessandro Lovato, Noemi Rocco

The complexity of many-body quantum wave functions is a central aspect of several fields of physics and chemistry where non-perturbative interactions are prominent. Artificial neural networks (ANNs) have proven to be a flexible tool to approximate quantum many-body states in condensed matter and chemistry problems. In this work we introduce a neural-network quantum state ansatz to model the ground-state wave function of light nuclei, and approximately solve the nuclear many-body Schrödinger equation. Using efficient stochastic sampling and optimization schemes, our approach extends pioneering applications of ANNs in the field, which present exponentially-scaling algorithmic complexity. We compute the binding energies and point-nucleon densities of A≤4 nuclei as emerging from a leading-order pionless effective field theory Hamiltonian. We successfully benchmark the ANN wave function against more conventional parametrizations based on two- and three-body Jastrow functions, and virtually-exact Green's function Monte Carlo results.

Show Abstract

Learning physically consistent differential equation models from data using group sparsity

Suryanarayana Maddu, Bevan L Cheeseman, Ivo F Sbalzarini, C. Müller

We propose a statistical learning framework based on group-sparse regression that can be used to 1) enforce conservation laws, 2) ensure model equivalence, and 3) guarantee symmetries when learning or inferring differential-equation models from measurement data. Directly learning interpretable mathematical models from data has emerged as a valuable modeling approach. However, in areas like biology, high noise levels, sensor-induced correlations, and strong inter-system variability can render data-driven models nonsensical or physically inconsistent without additional constraints on the model structure. Hence, it is important to leverage prior knowledge from physical principles to learn "biologically plausible and physically consistent" models rather than models that simply fit the data best. We present a novel group Iterative Hard Thresholding (gIHT) algorithm and use stability selection to infer physically consistent models with minimal parameter tuning. We show several applications from systems biology that demonstrate the benefits of enforcing priors in data-driven modeling.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates