2697 Publications

The local nanohertz gravitational-wave landscape from supermassive black hole binaries.

C. Mingarelli, T J Lazio, A Sesana, J Greene, J Ellis, C Ma, S Croft, S Burke-Spolaor , S Taylor

Supermassive black hole binary systems form in galaxy mergers and reside in galactic nuclei with large and poorly constrained concentrations of gas and stars. These systems emit nanohertz gravitational waves that will be detectable by pulsar timing arrays. Here we estimate the properties of the local nanohertz gravitational-wave landscape that includes individual supermassive black hole binaries emitting continuous gravitational waves and the gravitational-wave background that they generate. Using the 2 Micron All-Sky Survey, together with galaxy merger rates from the Illustris simulation project, we find that there are on average 91 ± 7 continuous nanohertz gravitational-wave sources, and 7 ± 2 binaries that will never merge, within 225 Mpc. These local unresolved gravitational-wave sources can generate a departure from an isotropic gravitational-wave background at a level of about 20 per cent, and if the cosmic gravitational-wave background can be successfully isolated, gravitational waves from at least one local supermassive black hole binary could be detected in 10 years with pulsar timing arrays.

Show Abstract
November 13, 2017

Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder

A Krishnan, R Zhang, V Yao, C Theesfeld, A. Wong, A Tadych, N. Volfovsky, Alan Packer, Ph.D., O. Troyanskaya

Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder with a strong genetic basis. Yet, only a small fraction of potentially causal genes-about 65 genes out of an estimated several hundred-are known with strong genetic evidence from sequencing studies. We developed a complementary machine-learning approach based on a human brain-specific gene network to present a genome-wide prediction of autism risk genes, including hundreds of candidates for which there is minimal or no prior genetic evidence. Our approach was validated in a large independent case-control sequencing study. Leveraging these genome-wide predictions and the brain-specific network, we demonstrated that the large set of ASD genes converges on a smaller number of key pathways and developmental stages of the brain. Finally, we identified likely pathogenic genes within frequent autism-associated copy-number variants and proposed genes and pathways that are likely mediators of ASD across multiple copy-number variants. All predictions and functional insights are available at http://asd.princeton.edu.

Show Abstract

Hack Weeks as a model for Data Science Education and Collaboration

D. Huppenkothen, A. Arendt, D. Hogg, K. Ram, J. VanderPlas, A. Rokem

Across almost all scientific disciplines, the instruments that record our experimental data and the methods required for storage and data analysis are rapidly increasing in complexity. This gives rise to the need for scientific communities to adapt on shorter time scales than traditional university curricula allow for, and therefore requires new modes of knowledge transfer. The universal applicability of data science tools to a broad range of problems has generated new opportunities to foster exchange of ideas and computational workflows across disciplines. In recent years, hack weeks have emerged as an effective tool for fostering these exchanges by providing training in modern data analysis workflows. While there are variations in hack week implementation, all events consist of a common core of three components: tutorials in state-of-the-art methodology, peer-learning and project work in a collaborative environment. In this paper, we present the concept of a hack week in the larger context of scientific meetings and point out similarities and differences to traditional conferences. We motivate the need for such an event and present in detail its strengths and challenges. We find that hack weeks are successful at cultivating collaboration and the exchange of knowledge. Participants self-report that these events help them both in their day-to-day research as well as their careers. Based on our results, we conclude that hack weeks present an effective, easy-to-implement, fairly low-cost tool to positively impact data analysis literacy in academic disciplines, foster collaboration and cultivate best practices.

Show Abstract
October 31, 2017

Linear models for systematics and nuisances

R. Luger, D. Foreman-Mackey, D. Hogg

The target of many astronomical studies is the recovery of tiny astrophysical signals living in a sea of uninteresting (but usually dominant) noise. In many contexts (i.e., stellar time-series, or high-contrast imaging, or stellar spectroscopy), there are structured components in this noise caused by systematic effects in the astronomical source, the atmosphere, the telescope, or the detector. More often than not, evaluation of the true physical model for these nuisances is computationally intractable and dependent on too many (unknown) parameters to allow rigorous probabilistic inference. Sometimes, housekeeping data---and often the science data themselves---can be used as predictors of the systematic noise. Linear combinations of simple functions of these predictors are often used as computationally tractable models that can capture the nuisances. These models can be used to fit and subtract systematics prior to investigation of the signals of interest, or they can be used in a simultaneous fit of the systematics and the signals. In this Note, we show that if a Gaussian prior is placed on the weights of the linear components, the weights can be marginalized out with an operation in pure linear algebra, which can (often) be made fast. We illustrate this model by demonstrating the applicability of a linear model for the non-linear systematics in K2 time-series data, where the dominant noise source for many stars is spacecraft motion and variability.

Show Abstract
October 30, 2017

Data analysis recipes: Using Markov Chain Monte Carlo

D. Hogg, D. Foreman-Mackey

Markov Chain Monte Carlo (MCMC) methods for sampling probability density functions (combined with abundant computational resources) have transformed the sciences, especially in performing probabilistic inferences, or fitting models to data. In this primarily pedagogical contribution, we give a brief overview of the most basic MCMC method and some practical advice for the use of MCMC in real inference problems. We give advice on method choice, tuning for performance, methods for initialization, tests of convergence, troubleshooting, and use of the chain output to produce or report parameter estimates with associated uncertainties. We argue that autocorrelation time is the most important test for convergence, as it directly connects to the uncertainty on the sampling estimate of any quantity of interest. We emphasize that sampling is a method for doing integrals; this guides our thinking about how MCMC output is best used.

Show Abstract
October 17, 2017

Symmetry breaking in occupation number based slave-particle methods

A. Georgescu, Sohrab Ismail-Beigi

We describe a theoretical approach to finding spontaneously symmetry-broken electronic phases due to strong electronic interactions when using recently developed slave-particle (slave-boson) approaches based on occupation numbers. We describe why, to date, spontaneous symmetry breaking has proven difficult to achieve in such approaches. We then provide a total energy based approach for introducing auxiliary symmetry-breaking fields into the solution of the slave-particle problem that leads to lowered total energies for symmetry-broken phases. We point out that not all slave-particle approaches yield energy lowering: the slave-particle model being used must explicitly describe the degrees of freedom that break symmetry. Finally, our total energy approach permits us to greatly simplify the formalism used to achieve a self-consistent solution between spinon and slave modes while increasing the numerical stability and greatly speeding up the calculations.

Show Abstract

Identifying direct contacts between protein complex subunits from their conditional dependence in proteomics datasets

Kevin Drew, C. Müller, R. Bonneau, Edward M Marcotte

Determining the three dimensional arrangement of proteins in a complex is highly beneficial for uncovering mechanistic function and interpreting genetic variation in coding genes comprising protein complexes. There are several methods for determining co-complex interactions between proteins, among them co-fractionation / mass spectrometry (CF-MS), but it remains difficult to identify directly contacting subunits within a multi-protein complex. Correlation analysis of CF-MS profiles shows promise in detecting protein complexes as a whole but is limited in its ability to infer direct physical contacts among proteins in sub-complexes. To identify direct protein-protein contacts within human protein complexes we learn a sparse conditional dependency graph from approximately 3,000 CF-MS experiments on human cell lines. We show substantial performance gains in estimating direct interactions compared to correlation analysis on a benchmark of large protein complexes with solved three-dimensional structures. We demonstrate the method's value in determining the three dimensional arrangement of proteins by making predictions for complexes without known structure (the exocyst and tRNA multi-synthetase complex) and by establishing evidence for the structural position of a recently discovered component of the core human EKC/KEOPS complex, GON7/C14ORF142, providing a more complete 3D model of the complex. Direct contact prediction provides easily calculable additional structural information for large-scale protein complex mapping studies and should be broadly applicable across organisms as more CF-MS datasets become available.

Show Abstract

Monte Carlo Tensor Network Renormalization

William Huggins, C. Daniel Freeman, Norm M. Tubman, Birgitta Whaley

Techniques for approximately contracting tensor networks are limited in how efficiently they can make use of parallel computing resources. In this work we demonstrate and characterize a Monte Carlo approach to the tensor network renormalization group method which can be used straightforwardly on modern computing architectures. We demonstrate the efficiency of the technique and show that Monte Carlo tensor network renormalization provides an attractive path to improving the accuracy of a wide class of challenging computations while also providing useful estimates of uncertainty and a statistical guarantee of unbiased results.

Show Abstract
October 10, 2017
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates