Many high-throughput sequencing data sets in biology are compositional in nature. A prominent example is microbiome profiling data, including targeted amplicon-based and metagenomic sequencing data. These profiling data comprises surveys of microbial communities in their natural habitat and sparse proportional (or compositional) read counts that represent operational taxonomic units or genes. When paired measurements of other covariates, including physicochemical properties of the habitat or phenotypic variables of the host, are available, inference of parsimonious and robust statistical relationships between the microbial abundance data and the covariate measurements is often an important first step in exploratory data analysis. To this end, we propose a sparse robust statistical regression framework that considers compositional and non-compositional measurements as predictors and identifies outliers in continuous response variables. Our model extends the seminal log-contrast model of Aitchison and Bacon-Shone (1984) by a mean shift formulation for capturing outliers, sparsity-promoting convex and non-convex penalties for parsimonious model selection, and data-driven robust initialization procedures adapted to the compositional setting. We show, in theory and simulations, the ability of our approach to jointly select a sparse set of predictive microbial features and identify outliers in the response. We illustrate the viability of our method by robustly predicting human body mass indices from American Gut Project amplicon data and non-compositional covariate data. We believe that the robust estimators introduced here and available in the R package RobRegCC can serve as a practical tool for reliable statistical regression analysis of compositional data, including microbiome survey data.
Numerical methods for approximately solving partial differential equations (PDE) are at the core of scientific computing. Often, this requires high-resolution or adaptive discretization grids to capture relevant spatio-temporal features in the PDE solution, e.g., in applications like turbulence, combustion, and shock propagation. Numerical approximation also requires knowing the PDE in order to construct problem-specific discretizations. Systematically deriving such solution-adaptive discrete operators, however, is a current challenge. Here we present an artificial neural network architecture for data-driven learning of problemand resolution-specific local discretizations of nonlinear PDEs. Our proposed method achieves numerically stable discretization of the operators in an unknown nonlinear PDE by spatially and temporally adaptive parametric pooling on regular Cartesian grids, and by incorporating knowledge about discrete time integration. Knowing the actual PDE is not necessary, as solution data is sufficient to train the network to learn the discrete operators. A once-trained neural architecture model can be used to predict solutions of the PDE on larger spatial domains and for longer times than it was trained for, hence addressing the problem of PDE-constrained extrapolation from data. We present demonstrative examples on long-term forecasting of hard numerical problems including equation-free forecasting of non-linear dynamics of forced Burgers problem on coarse spatio-temporal grids.
We introduce c-lasso, a Python package that enables sparse and robust linear regression and classification with linear equality constraints. The underlying statistical forward model is assumed to be of the following form: [ y = X β+ σε subject to Cβ=0 ] Here, X ∈ℝ
mathematical illustrations by Alex H. Barnett
Abstract Several antimicrobial peptides, including magainin and the human cathelicidin LL-37, act by forming pores in bacterial membranes. Bacteria such as Staphylococcus aureus modify their membrane's cardiolipin composition to resist such types of perturbations that compromise their membrane stability. Here, we used molecular dynamics simulations to quantify the role of cardiolipin on the formation of pores in simple bacterial-like membrane models composed of phosphatidylglycerol and cardiolipin mixtures. Cardiolopin modified the structure and ordering of the lipid bilayer, making it less susceptible to mechanical changes. Accordingly, the free-energy barrier for the formation of a transmembrane pore and its kinetic instability augmented by increasing the cardiolipin concentration. This is attributed to the unfavorable positioning of cardiolipin near the formed pore, due to its small polar-head and bulky hydrophobic-body. Overall, our study demonstrates how cardiolipin prevents membrane-pore formation and this constitutes a plausible mechanism used by bacteria to act against stress perturbations and, thereby, gain resistance to antimicrobial agents.
The efficiency of Hamiltonian Monte Carlo (HMC) can suffer when sampling a distribution with a wide range of length scales, because the small step sizes needed for stability in high-curvature regions are inefficient elsewhere. To address this we present a delayed rejection variant: if an initial HMC trajectory is rejected, we make one or more subsequent proposals each using a step size geometrically smaller than the last. We extend the standard delayed rejection framework by allowing the probability of a retry to depend on the probability of accepting the previous proposal. We test the scheme in several sampling tasks, including multiscale model distributions such as Neal's funnel, and statistical applications. Delayed rejection enables up to five-fold performance gains over optimally-tuned HMC, as measured by effective sample size per gradient evaluation. Even for simpler distributions, delayed rejection provides increased robustness to step size misspecification. Along the way, we provide an accessible but rigorous review of detailed balance for HMC.
The long-term goal of this work is the development of high-fidelity simulation tools for dispersive tsunami propagation. A dispersive model is especially important for short wavelength phenomena such as an asteroid impact into the ocean, and is also important in modeling other events where the simpler shallow water equations are insufficient. Adaptive simulations are crucial to bridge the scales from deep ocean to inundation, but have difficulties with the implicit system of equations that results from dispersive models. We propose a fractional step scheme that advances the solution on separate patches with different spatial resolutions and time steps. We show a simulation with 7 levels of adaptive meshes and onshore inundation resulting from a simulated asteroid impact off the coast of Washington. Finally, we discuss a number of open research questions that need to be resolved for high quality simulations.
Transition rates, survival probabilities, and quality of bias from time-dependent biased simulations
Simulations with an adaptive time-dependent bias, such as metadynamics, enable an efficient exploration of the conformational space of a system. However, the dynamic information of the system is altered by the bias. With infrequent metadynamics it is possible to recover the transition rate of crossing a barrier, if the collective variables are ideal and there is no bias deposition near the transition state. Unfortunately, for simulations of complex molecules, these conditions are not always fulfilled. To overcome these limitations, and inspired by single-molecule force spectroscopy, we developed a method based on Kramers' theory for calculating the barrier-crossing rate when a time-dependent bias is added to the system. We assess the quality of the bias parameter by measuring how efficiently the bias accelerates the transitions compared to ideal behavior. We present approximate analytical expressions of the survival probability that accurately reproduce the barrier-crossing time statistics, and enable the extraction of the unbiased transition rate even for challenging cases, where previous methods fail.
Quadrature by fundamental solutions: kernel-independent layer potential evaluation for large collections of simple objects
Well-conditioned boundary integral methods for the solution of elliptic boundary value problems (BVPs) are powerful tools for static and dynamic physical simulations. When there are many close-to-touching boundaries (eg, in complex fluids) or when the solution is needed in the bulk, nearly-singular integrals must be evaluated at many targets. We show that precomputing a linear map from surface density to an effective source representation renders this task highly efficient, in the common case where each object is "simple", ie, its smooth boundary needs only moderately many nodes. We present a kernel-independent method needing only an upsampled smooth surface quadrature, and one dense factorization, for each distinct shape. No (near-)singular quadrature rules are needed. The resulting effective sources are drop-in compatible with fast algorithms, with no local corrections nor bookkeeping. Our extensive numerical tests include 2D FMM-based Helmholtz and Stokes BVPs with up to 1000 objects (281000 unknowns), and a 3D Laplace BVP with 10 ellipsoids separated by 1/30 of a diameter. We include a rigorous analysis for analytic data in 2D and 3D.
A fast, high-order numerical method for the simulation of single-excitation states in quantum optics
We consider the numerical solution of a nonlocal partial differential equation which models the process of collective spontaneous emission in a two-level atomic system containing a single photon. We reformulate the problem as an integro-differential equation for the atomic degrees of freedom, and describe an efficient solver for the case of a Gaussian atomic density. The problem of history dependence arising from the integral formulation is addressed using sum-of-exponentials history compression. We demonstrate the solver on two systems of physical interest: in the first, an initially-excited atom decays into a photon by spontaneous emission, and in the second, a photon pulse is used to an excite an atom, which then decays.
- Previous Page
- Next Page