Publications

Auto-deconvolution and molecular networking of gas chromatography–mass spectrometry data

A. Aksenov, I. Laponogov, Z. Zhang, ..., J. Morton, et. al.

We engineered a machine learning approach, MSHub, to enable auto-deconvolution of gas chromatography–mass spectrometry (GC–MS) data. We then designed workflows to enable the community to store, process, share, annotate, compare and perform molecular networking of GC–MS data within the Global Natural Product Social (GNPS) Molecular Networking analysis platform. MSHub/GNPS performs auto-deconvolution of compound fragmentation patterns via unsupervised non-negative matrix factorization and quantifies the reproducibility of fragmentation patterns across samples.

Show Abstract

A Framework for Multiphase Galactic Wind Launching Using TIGRESS

Chang-Goo Kim, Eve C. Ostriker, D. Fielding, M. Smith, G. Bryan, R. Somerville, J. Forbes, S. Genel, Lars Hernquist

Galactic outflows have density, temperature, and velocity variations at least as large as that of the multiphase, turbulent interstellar medium (ISM) from which they originate. We have conducted a suite of parsec-resolution numerical simulations using the TIGRESS framework, in which outflows emerge as a consequence of interaction between supernovae (SNe) and the star-forming ISM. The outflowing gas is characterized by two distinct thermal phases, cool (T10^6 K), with most mass carried by the cool phase and most energy and newly-injected metals carried by the hot phase. Both components have a broad distribution of outflow velocity, and especially for cool gas this implies a varying fraction of escaping material depending on the halo potential. Informed by the TIGRESS results, we develop straightforward analytic formulae for the joint probability density functions (PDFs) of mass, momentum, energy, and metal loading as distributions in outflow velocity and sound speed. The model PDFs have only two parameters, SFR surface density \Sigma_SFR and the metallicity of the ISM, and fully capture the behavior of the original TIGRESS simulation PDFs over \Sigma_SFR~(10^{-4},1)M_sun/kpc^2/yr. Employing PDFs from resolved simulations will enable galaxy formation subgrid model implementations with wind velocity and temperature (as well as total loading factors) that are based on theoretical predictions rather than empirical tuning. This is a critical step to incorporate advances from TIGRESS and other high-resolution simulations in future cosmological hydrodynamics and semi-analytic galaxy formation models. We release a python package to prototype our model and to ease its implementation.

Show Abstract

Shear-induced dispersion in peristaltic flow

B. Chakrabarti, D. Saintillan

The effective diffusivity of a Brownian tracer in unidirectional flow is well known to be enhanced due to shear by the classic phenomenon of Taylor dispersion. At long times, the average concentration of the tracer follows a simplified advection–diffusion equation with an effective shear-dependent dispersivity. In this work, we make use of the generalized Taylor dispersion theory for periodic domains to analyze tracer dispersion by peristaltic pumping. In channels with small aspect ratios, asymptotic expansions in the lubrication limit are employed to obtain analytical expressions for the dispersion coefficient at both small and high Péclet numbers. Channels of arbitrary aspect ratios are also considered using a boundary integral formulation for the fluid flow coupled to a conservation equation for the effective dispersivity, which is solved using the finite-volume method. Our theoretical calculations, which compare well with results from Brownian dynamics simulations, elucidate the effects of channel geometry and pumping strength on shear-induced dispersion. We further discuss the connection between the present problem and dispersion due to Taylor’s swimming sheet and interpret our results in the purely diffusive regime in the context of Fick–Jacobs theory. Our results provide the theoretical basis for understanding passive scalar transport in peristaltic flow, for instance, in the ureter or in microfluidic peristaltic pumps.

Show Abstract

Protein Structural Alignments From Sequence

J. Morton, C. E.M. Strauss, R. Blackwell, D. Berenberg, V. Gligorijevic, R. Bonneau

Computing sequence similarity is a fundamental task in biology, with alignment forming the basis for the annotation of genes and genomes and providing the core data structures for evolutionary analysis. Standard approaches are a mainstay of modern molecular biology and rely on variations of edit distance to obtain explicit alignments between pairs of biological sequences. However, sequence alignment algorithms struggle with remote homology tasks and cannot identify similarities between many pairs of proteins with similar structures and likely homology. Recent work suggests that using machine learning language models can improve remote homology detection. To this end, we introduce DeepBLAST, that obtains explicit alignments from residue embeddings learned from a protein language model integrated into an end-to-end differentiable alignment framework. This approach can be accelerated on the GPU architectures and outperforms conventional sequence alignment techniques in terms of both speed and accuracy when identifying structurally similar proteins.

Show Abstract

Generalized co-sparse factor regression

A. Mishra, Dipak K. Dey, Yong Chen, Kun Chen

Multivariate regression techniques are commonly applied to explore the associations between large numbers of outcomes and predictors. In real-world applications, the outcomes are often of mixed types, including continuous measurements, binary indicators, and counts, and the observations may also be incomplete. Building upon the recent advances in mixed-outcome modeling and sparse matrix factorization, generalized co-sparse factor regression (GOFAR) is proposed, which utilizes the flexible vector generalized linear model framework and encodes the outcome dependency through a sparse singular value decomposition (SSVD) of the integrated natural parameter matrix. To avoid the estimation of the notoriously difficult joint SSVD, GOFAR proposes both sequential and parallel unit-rank estimation procedures. By combining the ideas of alternating convex search and majorization–minimization, an efficient algorithm is developed to solve the sparse unit-rank problem and implemented in the R package gofar. Extensive simulation studies and two real-world applications demonstrate the effectiveness of the proposed approach.

Show Abstract

Bayesian Workflow

Andrew Gelman, Aki Vehtari, Daniel Simpson, Charles C. Margossian, B. Carpenter, Yuling Yao, Lauren Kennedy, Jonah Gabry, Paul-Christian Bürkner, Martin Modrák

The Bayesian approach to data analysis provides a powerful way to handle uncertainty in all observations, model parameters, and model structure using probability theory. Probabilistic programming languages make it easier to specify and fit Bayesian models, but this still leaves us with many options regarding constructing, evaluating, and using these models, along with many remaining challenges in computation. Using Bayesian inference to solve real-world problems requires not only statistical skills, subject matter knowledge, and programming, but also awareness of the decisions made in the process of data analysis. All of these aspects can be understood as part of a tangled workflow of applied Bayesian statistics. Beyond inference, the workflow also includes iterative model building, model checking, validation and troubleshooting of computational problems, model understanding, and model comparison. We review all these aspects of workflow in the context of several examples, keeping in mind that in practice we will be fitting many models for any given problem, even if only a subset of them will ultimately be relevant for our conclusions.

Show Abstract

A Fast, Two-dimensional Gaussian Process Method Based on Celerite: Applications to Transiting Exoplanet Discovery and Characterization

Tyler Gordon, Eric Agol, D. Foreman-Mackey

Gaussian processes (GPs) are commonly used as a model of stochastic variability in astrophysical time series. In particular, GPs are frequently employed to account for correlated stellar variability in planetary transit light curves. The efficient application of GPs to light curves containing thousands to tens of thousands of data points has been made possible by recent advances in GP methods, including the celerite method. Here we present an extension of the celerite method to two input dimensions where, typically, the second dimension is small. This method scales linearly with the total number of data points when the noise in each large dimension is proportional to the same celerite kernel and only the amplitude of the correlated noise varies in the second dimension. We demonstrate the application of this method to the problem of measuring precise transit parameters from multiwavelength light curves and show that it has the potential to improve transit parameters measurements by orders of magnitude. Applications of this method include transit spectroscopy and exomoon detection, as well a broader set of astronomical problems.

Show Abstract

A Coupled Guiding Center–Boris Particle Pusher for Magnetized Plasmas in Compact-object Magnetospheres

Fabio Bacchini, B. Ripperda, S. Philippov, Kyle Parfrey

We present a novel numerical scheme for simulating the motion of relativistic charged particles in magnetospheres of compact objects, typically filled with highly magnetized collisionless plasmas. The new algorithm is based on a dynamic switch between the full system of equations of motion and a guiding-center approximation. The switch between the two formulations is based on the magnetization of the plasma particles, such that the dynamics are accurately captured by the guiding-center motion even when the gyrofrequency is underresolved by the time step. For particles with a large gyroradius, due to acceleration in, e.g., reconnecting current sheets, the algorithm adaptively switches to solve the full equations of motion instead. The new scheme is directly compatible with standard particle-in-cell codes, and is readily applicable in curved spacetimes via a dedicated covariant formulation. We test the performance of the coupled algorithm by evolving charged particles in electromagnetic configurations of reconnecting current sheets in magnetized plasma, obtained from special- and general-relativistic particle-in-cell simulations. The new coupled pusher is capable of producing highly accurate particle trajectories even when the time step is many orders of magnitude larger than the gyroperiod, substantially reducing the restrictions of the temporal resolution.

Show Abstract

A Coupled Guiding Center–Boris Particle Pusher for Magnetized Plasmas in Compact-object Magnetospheres

Fabio Bacchini, B. Ripperda, S. Philippov, Kyle Parfrey

We present a novel numerical scheme for simulating the motion of relativistic charged particles in magnetospheres of compact objects, typically filled with highly magnetized collisionless plasmas. The new algorithm is based on a dynamic switch between the full system of equations of motion and a guiding-center approximation. The switch between the two formulations is based on the magnetization of the plasma particles, such that the dynamics are accurately captured by the guiding-center motion even when the gyrofrequency is underresolved by the time step. For particles with a large gyroradius, due to acceleration in, e.g., reconnecting current sheets, the algorithm adaptively switches to solve the full equations of motion instead. The new scheme is directly compatible with standard particle-in-cell codes, and is readily applicable in curved spacetimes via a dedicated covariant formulation. We test the performance of the coupled algorithm by evolving charged particles in electromagnetic configurations of reconnecting current sheets in magnetized plasma, obtained from special- and general-relativistic particle-in-cell simulations. The new coupled pusher is capable of producing highly accurate particle trajectories even when the time step is many orders of magnitude larger than the gyroperiod, substantially reducing the restrictions of the temporal resolution.

Show Abstract

Constraining the Halo Mass of Damped Lyα Absorption Systems (DLAs) at z=2-3.5 using the Quasar-CMB Lensing Cross-correlation

Xiaojing Lin, Zheng Cai, Y. Li, Alex Krolewski, Simone Ferraro

We study the cross correlation of damped Ly$\alpha$ systems (DLAs) and their background quasars, using the most updated DLA catalog and the Planck 2018 CMB lensing convergence field. Our measurement suggests that the DLA bias $b_{\rm DLA}$ is smaller than $3.1$, corresponding to $\log(M/M_\odot h^{-1})\leq 12.3$ at a confidence of $90\%$. These constraints are broadly consistent with Alonso et al. (2018) and previous measurements by cross-correlation between DLAs and the Ly$\alpha$ forest (e.g. Font-Ribera et al. 2012; Perez-Rafols et al. 2018). Further, our results demonstrate the potential of obtaining a more precise measurement of the halo mass of high-redshift sources using next generation CMB experiments with a higher angular resolution. The python-based codes and data products of our analysis are available at \href{https://github.com/LittleLin1999/CMB-lensingxDLA}{this https URL}.

Show Abstract