Publications

Adaptive Monte Carlo augmented with normalizing flows

M. Gabrié, Grant M. Rotskoff, E. Vanden-Eijnden

Many problems in the physical sciences, machine learning, and statistical inference necessitate sampling from a high-dimensional, multi-modal probability distribution. Markov Chain Monte Carlo (MCMC) algorithms, the ubiquitous tool for this task, typically rely on random, reversible, and local updates to propagate configurations of a given system in a way that ensures that generated configurations will be distributed according to a target probability distribution asymptotically. In high-dimensional settings with multiple relevant metastable basins, local approaches require either immense computational effort or intricately designed importance sampling strategies to capture information about, for example, the relative populations of such basins. Here we analyze a framework for augmenting MCMC sampling with nonlocal transition kernels parameterized with generative models known as normalizing flows. We focus on a setting where there is no preexisting data, as is commonly the case for problems in which MCMC is used. Our results emphasize that the implementation of the normalizing flow must be adapted to the structure of the target distribution in order to preserve the statistics of the target at all scales. Furthermore, we analyze the propensity of our algorithm to discover new states and demonstrate the importance of initializing the training with some

Show Abstract

cuFINUFFT: a load-balanced GPU library for general-purpose nonuniform FFTs

Yu-hsuan Shih, Garrett Wright, Joakim Andén, Johannes Blaschke, A. Barnett

Nonuniform fast Fourier transforms dominate the computational cost in many applications including image reconstruction and signal processing. We thus present a general-purpose GPU-based CUDA library for type 1 (nonuniform to uniform) and type 2 (uniform to nonuniform) transforms in dimensions 2 and 3, in single or double precision. It achieves high performance for a given user-requested accuracy, regardless of the distribution of nonuniform points, via cache-aware point reordering, and load-balanced blocked spreading in shared memory. At low accuracies, this gives on-GPU throughputs around $10^9$ nonuniform points per second, and (even including host-device transfer) is typically 4-10$\times$ faster than the latest parallel CPU code FINUFFT (at 28 threads). It is competitive with two established GPU codes, being up to 90$\times$ faster at high accuracy and/or type 1 clustered point distributions. Finally we demonstrate a 6-18$\times$ speedup versus CPU in an X-ray diffraction 3D iterative reconstruction task at $10^{-12}$ accuracy, observing excellent multi-GPU weak scaling up to one rank per GPU. paper talk: https://www.youtube.com/watch?v=PnW6ehMyHxM.

Show Abstract

Calibration of the Hα Age–Activity Relation for M Dwarfs

Rocio Kiman, Jacqueline K. Faherty, Kelle L. Cruz, ..., R. Angus, et. al.

In this work, we calibrate the relationship between Halpha emission and M dwarf ages. We compile a sample of 892 M dwarfs with Halpha equivalent width (HaEW) measurements from the literature that are either co-moving with a white dwarf of known age (21 stars) or in a known young association (871 stars). In this sample we identify 7 M dwarfs that are new candidate members of known associations. By dividing the stars into active and inactive categories according to their HaEW and spectral type (SpT), we find that the fraction of active dwarfs decreases with increasing age, and the form of the decline depends on SpT. Using the compiled sample of age-calibrators we find that HaEW and fractional Halpha luminosity (LHaLbol) decrease with increasing age. HaEW for SpT<M7 decreases gradually up until ~1Gyr. For older ages, we found only two early M dwarfs which are both inactive and seem to continue the gradual decrease. We also found 14 mid-type out of which 11 are inactive and present a significant decrease of HaEW, suggesting that the magnetic activity decreases rapidly after ~1Gyr. We fit LHaLbol versus age with a broken power-law and find an index of -0.11+0.02-0.01 for ages <~776Myr. The index becomes much steeper at older ages however a lack of field age-calibrators leaves this part of the relation far less constrained. Finally, from repeated independent measurements for the same stars we find that 94% of these has a level of HaEW variability <=5A at young ages (<1Gyr).

Show Abstract

Chemodynamically Characterizing the Jhelum Stellar Stream with APOGEE-2

Allyson A. Sheffield, Aidan Z. Subrahimovic, Mohammad Refat, ..., A. Price-Whelan, et. al.

We present the kinematic and chemical profiles of red giant stars observed by the APOGEE-2 survey in the direction of the Jhelum stellar stream, a Milky Way substructure located in the inner halo of the Milky Way at a distance from the Sun of ≈ 13 kpc. From the six APOGEE-2 Jhelum pointings, we isolate stars with log(g) < 3.5, leaving a sample of 289 red giant stars. From this sample of APOGEE giants, we identified seven stars that are consistent with the astrometric signal from Gaia DR2 for this stream. Of these seven, one falls onto the RGB along the same sequence as the Jhelum stars presented by \cite{ji20}. This new Jhelum member has [Fe/H]=-2.2 and is at the tip of the red giant branch. By selecting high orbital eccentricity, metal-rich stars, we identify red giants in our APOGEE sample that are likely associated with the Gaia-Enceladus-Sausage (GES) merger. We compare the abundance profiles of the Jhelum stars and GES stars and find similar trends in α-elements, as expected for low-metallicity populations. However, we find that the orbits for GES and Jhelum stars are not generally consistent with a shared origin. The chemical abundances for the APOGEE Jhelum star and other confirmed members of the stream are similar to stars in known stellar streams and thus are consistent with an accreted dwarf galaxy origin for the progenitor of the stream, although we cannot rule out a globular cluster origin.

Show Abstract

A Stellar Activity F-statistic for Exoplanet Surveys (SAFE)

Parker H. Holzer, Jessi Cisewski-Kehe, L. Zhao, Eric B. Ford, Christian Gilbertson, Debra A. Fischer

In the search for planets orbiting distant stars the presence of stellar activity in the atmospheres of observed stars can obscure the radial velocity signal used to detect such planets. Furthermore, this stellar activity contamination is set by the star itself and cannot simply be avoided with better instrumentation. Various stellar activity indicators have been developed that may correlate with this contamination. We introduce a new stellar activity indicator called the Stellar Activity F-statistic for Exoplanet surveys (SAFE) that has higher statistical power (i.e., probability of detecting a true stellar activity signal) than many traditional stellar activity indicators in a simulation study of an active region on a Sun-like star with moderate to high signal-to-noise. Also through simulation, the SAFE is demonstrated to be associated with the projected area on the visible side of the star covered by active regions. We also demonstrate that the SAFE detects statistically significant stellar activity in most of the spectra for HD 22049, a star known to have high stellar variability. Additionally, the SAFE is calculated for recent observations of the three low-variability stars HD 34411, HD 10700, and HD 3651, the latter of which is known to have a planetary companion. As expected, the SAFE for these three only occasionally detects activity. Furthermore, initial exploration appears to indicate that the SAFE may be useful for disentangling stellar activity signals from planet-induced Doppler shifts.

Show Abstract

CMB lensing power spectrum estimation without instrument noise bias

Mathew S. Madhavacheril, Kendrick M. Smith, Blake D. Sherwin, S. Naess

The power spectrum of cosmic microwave background (CMB) lensing will be measured to sub-percent precision with upcoming surveys, enabling tight constraints on the sum of neutrino masses and other cosmological parameters. Measuring the lensing power spectrum involves the estimation of the connected trispectrum of the four-point function of the CMB map, which requires the subtraction of a large Gaussian disconnected noise bias. This reconstruction noise bias receives contributions both from CMB and foreground fluctuations as well as instrument noise (both detector and atmospheric noise for ground-based surveys). The debiasing procedure therefore relies on the quality of simulations of the instrument noise which may be expensive or inaccurate. We propose a new estimator that makes use of at least four splits of the CMB maps with independent instrument noise. This estimator makes the CMB lensing power spectrum completely insensitive to any assumptions made in modeling or simulating the instrument noise. We show that this estimator, in many practical situations, leads to no substantial loss in signal-to-noise. We provide an efficient algorithm for its computation that scales with the number of splits m as (m2) as opposed to a naive (m4) expectation.

Show Abstract

Single nucleus multi-omics regulatory atlas of the murine pituitary

F Ruf-Zamojski, Z. Zhang, M Zamojski, O. Troyanskaya, S Sealfon, et al.

To provide a multi-omics resource and investigate transcriptional regulatory mechanisms, we profile the transcriptome, chromatin accessibility, and methylation status of over 70,000 single nuclei (sn) from adult mouse pituitaries. Paired snRNAseq and snATACseq datasets from individual animals highlight a continuum between developmental epigenetically-encoded cell types and transcriptionally-determined transient cell states. Co-accessibility analysis-based identification of a putative Fshb cis-regulatory domain that overlaps the fertility-linked rs11031006 human polymorphism, followed by experimental validation illustrate the use of this resource for hypothesis generation. We also identify transcriptional and chromatin accessibility programs distinguishing each major cell type. Regulons, which are co-regulated gene sets sharing binding sites for a common transcription factor driver, recapitulate cell type clustering. We identify both cell type-specific and sex-specific regulons that are highly correlated with promoter accessibility, but not with methylation state, supporting the centrality of chromatin accessibility in shaping cell-defining transcriptional programs. The sn multi-omics atlas is accessible at snpituitaryatlas.princeton.edu.

Show Abstract

AI-assisted super-resolution cosmological simulations

Y. Li, Yueying Ni, Rupert A. C. Croft, Tiziana Di Matteo, Simeon Bird, Yu Feng

Cosmological simulations of galaxy formation are limited by finite computational resources. We draw from the ongoing rapid advances in Artificial Intelligence (specifically Deep Learning) to address this problem. Neural networks have been developed to learn from high-resolution (HR) image data, and then make accurate super-resolution (SR) versions of different low-resolution (LR) images. We apply such techniques to LR cosmological N-body simulations, generating SR versions. Specifically, we are able to enhance the simulation resolution by generating 512 times more particles and predicting their displacements from the initial positions. Therefore our results can be viewed as new simulation realizations themselves rather than projections, e.g., to their density fields. Furthermore, the generation process is stochastic, enabling us to sample the small-scale modes conditioning on the large-scale environment. Our model learns from only 16 pairs of LR-HR simulations, and is then able to generate SR simulations that successfully reproduce the matter power spectrum and the halo mass function of the HR targets. We successfully deploy the model in a box 1000 times larger than the training simulation box, showing that high-resolution mock surveys can be generated rapidly. We conclude that AI assistance has the potential to revolutionize modeling of small-scale galaxy formation physics in large cosmological volumes.

Show Abstract

Recovering missing data in coherent diffraction imaging

D. Barmherzig, A. Barnett, C. Epstein, L. Greengard, J. Magland, M. Rachh

In coherent diffraction imaging (CDI) experiments, the intensity of the scattered wave impinging on an object is measured on an array of detectors. This signal can be interpreted as the square of the modulus of the Fourier transform of the unknown scattering density. A beam-stop obstructs the forward scattered wave and, hence, the modulus Fourier data from a neighborhood of k=0 cannot be measured. In this note, we describe a linear method for recovering this unmeasured modulus Fourier data from the measured values and an estimate of the support of the image's autocorrelation function without consideration of phase retrieval. We analyze the conditioning of this problem, which grows exponentially with the modulus of the maximum spatial frequency not measured, and the effects of noise.

Show Abstract

Single-cell gene regulatory network inference at scale: The Inferelator 3.0

C. Skok Gibbs, A. Watters, N. Carriero, R. Bonneau, et al.

Gene regulatory networks define regulatory relationships between transcription factors and target genes within a biological system, and reconstructing them is essential for understanding cellular growth and function. In this work, we present the Inferelator 3.0, which has been significantly updated to integrate data from distinct cell types to learn context-specific regulatory networks and aggregate them into a shared regulatory network, while retaining the functionality of the previous versions. The Inferelator 3.0 reliably learns informative networks from the model organisms Bacillus subtilis and Saccharomyces cerevisiae. We demonstrate its capabilities by learning networks for multiple distinct neuronal and glial cell types in the developing Mus musculus brain at E18 from a large (1.3 million) single-cell gene expression data set with paired single-cell chromatin accessibility data.

Show Abstract