2005 Publications

Deep Network Classification by Scattering and Homotopy Dictionary Learning

John Zarka, Louis Thiry, Tomás Angles, S. Mallat

We introduce a sparse scattering deep convolutional neural network, which provides a simple model to analyze properties of deep representation learning for classification. Learning a single dictionary matrix with a classifier yields a higher classification accuracy than AlexNet over the ImageNet 2012 dataset. The network first applies a scattering transform that linearizes variabilities due to geometric transformations such as translations and small deformations. A sparse $\ell^1$ dictionary coding reduces intra-class variability while preserving class separation through projections over unions of linear spaces. It is implemented in a deep convolutional network with a homotopy algorithm having an exponential convergence. A convergence proof is given in a general framework that includes ALISTA. Classification results are analyzed on ImageNet.

Show Abstract

Metabolome-Informed Microbiome Analysis Refines Metadata Classifications and Reveals Unexpected Medication Transfer in Captive Cheetahs

J. Gauglitz, J. Morton, A. Tripathi, S. Hansen, M. Gaffney, C. Carpenter, K. Weldon, R. Shah, A. Parampil, A. Fidgett, A. Swafford, R. Knight, P. Dorrenstein

Topological defects determine the structure and function of physical and biological matter over a wide range of scales, from the turbulent vortices in planetary atmospheres, oceans or quantum fluids to bioelectrical signalling in the heart1,2,3 and brain4, and cell death5. Many advances have been made in understanding and controlling the defect dynamics in active6,7,8,9 and passive9,10 non-equilibrium fluids. Yet, it remains unknown whether the statistical laws that govern the dynamics of defects in classical11 or quantum fluids12,13,14 extend to the active matter7,15,16 and information flows17,18 in living systems. Here, we show that a defect-mediated turbulence underlies the complex wave propagation patterns of Rho-GTP signalling protein on the membrane of starfish egg cells, a process relevant to cytoskeletal remodelling and cell proliferation19,20. Our experiments reveal that the phase velocity field extracted from Rho-GTP concentration waves exhibits vortical defect motions and annihilation dynamics reminiscent of those seen in quantum systems12,13, bacterial turbulence15 and active nematics7. Several key statistics and scaling laws of the defect dynamics can be captured by a minimal Helmholtz–Onsager point vortex model21 as well as a generic complex Ginzburg–Landau22 continuum theory, suggesting a close correspondence between the biochemical signal propagation on the surface of a living cell and a widely studied class of two-dimensional turbulence23 and wave22 phenomena.

Show Abstract
March 10, 2020

Customer-server population dynamics in heavy traffic

R. Atar, P. Karmakar, D. Lipshutz

We study a many-server queueing model with server vacations, where the population size dynamics of servers and customers are coupled: a server may leave for vacation only when no customers await, and the capacity available to customers is directly affected by the number of servers on vacation. We focus on scaling regimes in which server dynamics and queue dynamics fluctuate at matching time scales, so that their limiting dynamics are coupled. Specifically, we argue that interesting coupled dynamics occur in (a) the Halfin-Whitt regime, (b) the nondegenerate slowdown regime, and (c) the intermediate, near Halfin-Whitt regime; whereas the dynamics asymptotically decouple in the other heavy traffic regimes. We characterize the limiting dynamics, which are different for each scaling regime. We consider relevant respective performance measures for regimes (a) and (b) --- namely, the probability of wait and the slowdown. While closed form formulas for these performance measures have been derived for models that do not accommodate server vacations, it is difficult to obtain closed form formulas for these performance measures in the setting with server vacations. Instead, we propose formulas that approximate these performance measures, and depend on the steady-state mean number of available servers and previously derived formulas for models without server vacations. We test the accuracy of these formulas numerically.

Show Abstract

Inference of Multisite Phosphorylation Rate Constants and Their Modulation by Pathogenic Mutations

E. Yeung, S. McFann, L. Marsh, E. Dufresne, S. Filippi, H. Harrington, S. Shvartsman, M. Wühr

Multisite protein phosphorylation plays a critical role in cell regulation [1, 2, 3]. It is widely appreciated that the functional capabilities of multisite phosphorylation depend on the order and kinetics of phosphorylation steps, but kinetic aspects of multisite phosphorylation remain poorly understood [4, 5, 6]. Here, we focus on what appears to be the simplest scenario, when a protein is phosphorylated on only two sites in a strict, well-defined order. This scenario describes the activation of ERK, a highly conserved cell-signaling enzyme. We use Bayesian parameter inference in a structurally identifiable kinetic model to dissect dual phosphorylation of ERK by MEK, a kinase that is mutated in a large number of human diseases [7, 8, 9, 10, 11, 12]. Our results reveal how enzyme processivity and efficiencies of individual phosphorylation steps are altered by pathogenic mutations. The presented approach, which connects specific mutations to kinetic parameters of multisite phosphorylation mechanisms, provides a systematic framework for closing the gap between studies with purified enzymes and their effects in the living organism.

Show Abstract

Inference of Multisite Phosphorylation Rate Constants and Their Modulation by Pathogenic Mutations

E. Yeung, S. McFann, L. Marsh, E. Dufresne, S. Fillipi, H. Harrington, S. Shvartsman, M. Wühr

Multisite protein phosphorylation plays a critical role in cell regulation [1, 2, 3]. It is widely appreciated that the functional capabilities of multisite phosphorylation depend on the order and kinetics of phosphorylation steps, but kinetic aspects of multisite phosphorylation remain poorly understood [4, 5, 6]. Here, we focus on what appears to be the simplest scenario, when a protein is phosphorylated on only two sites in a strict, well-defined order. This scenario describes the activation of ERK, a highly conserved cell-signaling enzyme. We use Bayesian parameter inference in a structurally identifiable kinetic model to dissect dual phosphorylation of ERK by MEK, a kinase that is mutated in a large number of human diseases [7, 8, 9, 10, 11, 12]. Our results reveal how enzyme processivity and efficiencies of individual phosphorylation steps are altered by pathogenic mutations. The presented approach, which connects specific mutations to kinetic parameters of multisite phosphorylation mechanisms, provides a systematic framework for closing the gap between studies with purified enzymes and their effects in the living organism.

Show Abstract

The role of delocalized chemical bonding in square-net-based topological semimetals

Sebastian Klemenz, Aurland K. Hay, Samuel M. L. Teicher, Andreas Topp, J. Cano, Leslie M. Schoop

Principles that predict reactions or properties of materials define the discipline of chemistry. In this work, we derive chemical rules, based on atomic distances and chemical bond character, which predict topological materials in compounds that feature the structural motif of a square-net. Using these rules, we identify over 300 potential new topological materials. We show that simple chemical heuristics can be a powerful tool to characterize topological matter. In contrast to previous database-driven materials’ categorization, our approach allows us to identify candidates that are alloys, solid-solutions, or compounds with statistical vacancies. While previous material searches relied on density functional theory, our approach is not limited by this method and could also be used to discover magnetic and statistically disordered topological semimetals.

Show Abstract

A morphology-independent search for gravitational wave echoes in data from the first and second observing runs of Advanced LIGO and Advanced Virgo

Ka Wa Tsang, Archisman Ghosh, Anuradha Samajdar, K. Chatziioannou, et. al.

Gravitational wave echoes have been proposed as a smoking-gun signature of exotic compact objects with near-horizon structure. Recently there have been observational claims that echoes are indeed present in stretches of data from Advanced LIGO and Advanced Virgo immediately following gravitational wave signals from presumed binary black hole mergers, as well as a binary neutron star merger. In this paper we deploy a morphology-independent search algorithm for echoes introduced in Tsang et al., Phys. Rev. D 98, 024023 (2018), which (a) is able to accurately reconstruct a possible echoes signal with minimal assumptions about their morphology, and (b) computes Bayesian evidences for the hypotheses that the data contain a signal, an instrumental glitch, or just stationary, Gaussian noise. Here we apply this analysis method to all the significant events in the first Gravitational Wave Transient Catalog (GWTC-1), which comprises the signals from binary black hole and binary neutron star coalescences found during the first and second observing runs of Advanced LIGO and Advanced Virgo. In all cases, the ratios of evidences for signal versus noise and signal versus glitch do not rise above their respective "background distributions" obtained from detector noise, the smallest p-value being 3% (for event GW170823). Hence we find no statistically significant evidence for echoes in GWTC-1.

Show Abstract

Hyper-molecules: on the representation and recovery of dynamical structures for applications in flexible macro-molecules in cryo-EM

Roy R Lederman, J. Andén, Amit Singer

Cryo-electron microscopy (cryo-EM), the subject of the 2017 Nobel Prize in Chemistry, is a technology for determining the 3-D structure of macromolecules from many noisy 2-D projections of instances of these macromolecules, whose orientations and positions are unknown. The molecular structures are not rigid objects, but flexible objects involved in dynamical processes. The different conformations are exhibited by different instances of the macromolecule observed in a cryo-EM experiment, each of which is recorded as a particle image. The range of conformations and the conformation of each particle are not known a priori; one of the great promises of cryo-EM is to map this conformation space. Remarkable progress has been made in determining rigid structures from homogeneous samples of molecules in spite of the unknown orientation of each particle image and significant progress has been made in recovering a few distinct states from mixtures of rather distinct conformations, but more complex heterogeneous samples remain a major challenge. We introduce the ``hyper-molecule'' framework for modeling structures across different states of heterogeneous molecules, including continuums of states. The key idea behind this framework is representing heterogeneous macromolecules as high-dimensional objects, with the additional dimensions representing the conformation space. This idea is then refined to model properties such as localized heterogeneity. In addition, we introduce an algorithmic framework for recovering such maps of heterogeneous objects from experimental data using a Bayesian formulation of the problem and Markov chain Monte Carlo (MCMC) algorithms to address the computational challenges in recovering these high dimensional hyper-molecules. We demonstrate these ideas in a prototype applied to synthetic data.

Show Abstract

A boundary integral equation approach to computing eigenvalues of the stokes operator

Travis Askham, M. Rachh

The eigenvalues and eigenfunctions of the Stokes operator have been the subject of intense analytical investigation and have applications in the study and simulation of the Navier–Stokes equations. As the Stokes operator is second order and has the divergence-free constraint, computing these eigenvalues and the corresponding eigenfunctions is a challenging task, particularly in complex geometries and at high frequencies. The boundary integral equation (BIE) framework provides robust and scalable eigenvalue computations due to (a) the reduction in the dimension of the problem to be discretized and (b) the absence of high-frequency “pollution” when using Green’s function to represent propagating waves. In this paper, we detail the theoretical justification for a BIE approach to the Stokes eigenvalue problem on simply- and multiply-connected planar domains, which entails a treatment of the uniqueness theory for oscillatory Stokes equations on exterior domains. Then, using well-established techniques for discretizing BIEs, we present numerical results which confirm the analytical claims of the paper and demonstrate the efficiency of the overall approach.

Show Abstract

Optimal tuning of weighted kNN- and diffusion-based methods for denoising single cell genomics data

A Tjärnberg, O Mahmood, C Jackson, G Saldi, K Cho, L Christiaen, R. Bonneau

The analysis of single-cell genomics data presents several statistical challenges, and extensive efforts have been made to produce methods for the analysis of this data that impute missing values, address sampling issues and quantify and correct for noise. In spite of such efforts, no consensus on best practices has been established and all current approaches vary substantially based on the available data and empirical tests. The k-Nearest Neighbor Graph (kNN-G) is often used to infer the identities of, and relationships between, cells and is the basis of many widely used dimensionality-reduction and projection methods. The kNN-G has also been the basis for imputation methods using, e.g., neighbor averaging and graph diffusion. However, due to the lack of an agreed-upon optimal objective function for choosing hyperparameters, these methods tend to oversmooth data, thereby resulting in a loss of information with regard to cell identity and the specific gene-to-gene patterns underlying regulatory mechanisms. In this paper, we investigate the tuning of kNN- and diffusion-based denoising methods with a novel non-stochastic method for optimally preserving biologically relevant informative variance in single-cell data. The framework, Denoising Expression data with a Weighted Affinity Kernel and Self-Supervision (DEWÄKSS), uses a self-supervised technique to tune its parameters. We demonstrate that denoising with optimal parameters selected by our objective function (i) is robust to preprocessing methods using data from established benchmarks, (ii) disentangles cellular identity and maintains robust clusters over dimension-reduction methods, (iii) maintains variance along several expression dimensions, unlike previous heuristic-based methods that tend to oversmooth data variance, and (iv) rarely involves diffusion but rather uses a fixed weighted kNN graph for denoising. Together, these findings provide a new understanding of kNN- and diffusion-based denoising methods and serve as a foundation for future research. Code and example data for DEWÄKSS is available at https://gitlab.com/Xparx/dewakss/-/tree/Tjarnberg2020branch.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates