2596 Publications

Specificity, synergy, and mechanisms of splice-modifying drugs

Yuma Ishigami, Mandy S. Wong, S. Hanson, et al.

Drugs that target pre-mRNA splicing hold great therapeutic potential, but the quantitative understanding of how these drugs work is limited. Here we introduce mechanistically interpretable quantitative models for the sequence-specific and concentration-dependent behavior of splice-modifying drugs. Using massively parallel splicing assays, RNA-seq experiments, and precision dose-response curves, we obtain quantitative models for two small-molecule drugs, risdiplam and branaplam, developed for treating spinal muscular atrophy. The results quantitatively characterize the specificities of risdiplam and branaplam for 5’ splice site sequences, suggest that branaplam recognizes 5’ splice sites via two distinct interaction modes, and contradict the prevailing two-site hypothesis for risdiplam activity at SMN2 exon 7. The results also show that anomalous single-drug cooperativity, as well as multi-drug synergy, are widespread among small-molecule drugs and antisense-oligonucleotide drugs that promote exon inclusion. Our quantitative models thus clarify the mechanisms of existing treatments and provide a basis for the rational development of new therapies.

Show Abstract

NOVA1 acts as an oncogenic RNA-binding protein to regulate cholesterol homeostasis in human glioblastoma cells

Yuhki Saito, C. Park, et al.

NOVA1 is a neuronal RNA-binding protein identified as the target antigen of a rare autoimmune disorder associated with cancer and neurological symptoms, termed paraneoplastic opsoclonus-myoclonus ataxia. Despite the strong association between NOVA1 and cancer, it has been unclear how NOVA1 function might contribute to cancer biology. In this study, we find that NOVA1 acts as an oncogenic factor in a GBM (glioblastoma multiforme) cell line established from a patient. Interestingly, NOVA1 and Argonaute (AGO) CLIP identified common 3′ untranslated region (UTR) targets, which were down-regulated in NOVA1 knockdown GBM cells, indicating a transcriptome-wide intersection of NOVA1 and AGO–microRNA (miRNA) targets regulation. NOVA1 binding to 3′UTR targets stabilized transcripts including those encoding cholesterol homeostasis related proteins. Selective inhibition of NOVA1–RNA interactions with antisense oligonucleotides disrupted GBM cancer cell fitness. The precision of our GBM CLIP studies point to both mechanism and precise RNA sequence sites to selectively inhibit oncogenic NOVA1–RNA interactions. Taken together, we find that NOVA1 is commonly overexpressed in GBM, where it can antagonize AGO2–miRNA actions and consequently up-regulates cholesterol synthesis, promoting cell viability.

Show Abstract

Neural Manifold Capacity Captures Representation Geometry, Correlations, and Task-Efficiency Across Species and Behaviors

C. Chou , Luke Arend, Albert J. Wakhloo, Royoung Kim, Will Slatton, S. Chung

The study of the brain encompasses multiple scales, including temporal, spatial, and functional aspects. To integrate understanding across these different levels and modalities, it requires developing quantification methods and frameworks. Here, we present effective Geometric measures from Correlated Manifold Capacity theory (GCMC) for probing the functional structure in neural representations. We utilize a statistical physics approach to establish analytical connections between neural co-variabilities and downstream read-out efficiency. These effective geometric measures capture both stimulus-driven and behavior-driven structures in neural population activities, while extracting computationally-relevant information from neural data into intuitive and interpretable analysis descriptors. We apply GCMC to a diverse collection of datasets with different recording methods, various model organisms, and multiple task modalities. Specifically, we demonstrate that GCMC enables a wide range of multi-scale data analysis. This includes quantifying the spatial progression of encoding efficiency across brain regions, revealing the temporal dynamics of task-relevant manifold geometry in information processing, and characterizing variances as well as invariances in neural representations throughout learning. Lastly, the effective manifold geometric measures may be viewed as order parameters for phases related to computational efficiency, facilitating data-driven hypothesis generation and latent embedding.

Show Abstract

Statistical Component Separation for Targeted Signal Recovery in Noisy Mixtures

B. Régaldo-Saint Blancard, M. Eickenberg

Separating signals from an additive mixture may be an unnecessarily hard problem when one is only interested in specific properties of a given signal. In this work, we tackle simpler "statistical component separation" problems that focus on recovering a predefined set of statistical descriptors of a target signal from a noisy mixture. Assuming access to samples of the noise process, we investigate a method devised to match the statistics of the solution candidate corrupted by noise samples with those of the observed mixture. We first analyze the behavior of this method using simple examples with analytically tractable calculations. Then, we apply it in an image denoising context employing 1) wavelet-based descriptors, 2) ConvNet-based descriptors on astrophysics and ImageNet data. In the case of 1), we show that our method better recovers the descriptors of the target data than a standard denoising method in most situations. Additionally, despite not constructed for this purpose, it performs surprisingly well in terms of peak signal-to-noise ratio on full signal reconstruction. In comparison, representation 2) appears less suitable for image denoising. Finally, we extend this method by introducing a diffusive stepwise algorithm which gives a new perspective to the initial method and leads to promising results for image denoising under specific circumstances.

Show Abstract

Training self-learning circuits for power-efficient solutions

Menachem Stern , Douglas J. Durian, Andrea J. Liu, et al.

As the size and ubiquity of artificial intelligence and computational machine learning models grow, the energy required to train and use them is rapidly becoming economically and environmentally unsustainable. Recent laboratory prototypes of self-learning electronic circuits, such as “physical learning machines,” open the door to analog hardware that directly employs physics to learn desired functions from examples at a low energy cost. In this work, we show that this hardware platform allows for an even further reduction in energy consumption by using good initial conditions and a new learning algorithm. Using analytical calculations, simulations, and experiments, we show that a trade-off emerges when learning dynamics attempt to minimize both the error and the power consumption of the solution—greater power reductions can be achieved at the cost of decreasing solution accuracy. Finally, we demonstrate a practical procedure to weigh the relative importance of error and power minimization, improving the power efficiency given a specific tolerance to error.

Show Abstract

Responses of neurons in macaque V4 to object and texture images

Justin D. Lieber, T. D. Oleskiw , E. P. Simoncelli, J. A. Movshon

Humans and monkeys can effortlessly recognize objects in everyday scenes. This ability relies on neural computations in the ventral stream of visual cortex. The intermediate computations that lead to object selectivity are not well understood, but previous studies implicate V4 as an early site of selectivity for object shape. To explore the mechanisms of this selectivity, we generated a continuum of images between “scrambled” textures and photographic images of both natural and manmade environments, using techniques that preserve the local statistics of the original image while discarding information about scene and shape. We measured the responses of single units in awake macaque V4 to these images. On average, V4 neurons were slightly more active in response to photographic images than to their scrambled counterparts. However, responses in V4 varied widely both across different cells and different sets of images. An important determinant of this variation was the effectiveness of image families at driving strong neural responses. Across the full V4 population, a cell’s average evoked firing rate for a family reliably predicted that family’s preference for photographic over scrambled images. Accordingly, the cells that respond most strongly to each image family showed a much stronger difference between photographic and scrambled images and a graded level of modulation for images scrambled at intermediate levels. This preference for photographic images was not evident until ∼50 ms after the onset of neuronal activity and did not peak in strength until 140 ms after activity onset. Finally, V4 neural responses seemed to categorically separate photographic images from all of their scrambled counterparts, despite the fact that the least scrambled images in our set appear similar to the originals. When these same images were analyzed with DISTS (Deep Image Structure and Texture Similarity), an image-computable similarity metric that predicts human judgements of image degradation, this same pattern emerged. This suggests that V4 responses are highly sensitive to small deviations from photographic image structure.

Show Abstract

Tapioca: a platform for predicting de novo protein–protein interactions in dynamic contexts

Tavis J. Reed, Matthew D. Tyl, O. Troyanskaya, et al.

Protein–protein interactions (PPIs) drive cellular processes and responses to environmental cues, reflecting the cellular state. Here we develop Tapioca, an ensemble machine learning framework for studying global PPIs in dynamic contexts. Tapioca predicts de novo interactions by integrating mass spectrometry interactome data from thermal/ion denaturation or cofractionation workflows with protein properties and tissue-specific functional networks. Focusing on the thermal proximity coaggregation method, we improved the experimental workflow. Finely tuned thermal denaturation afforded increased throughput, while cell lysis optimization enhanced protein detection from different subcellular compartments. The Tapioca workflow was next leveraged to investigate viral infection dynamics. Temporal PPIs were characterized during the reactivation from latency of the oncogenic Kaposi’s sarcoma-associated herpesvirus. Together with functional assays, NUCKS was identified as a proviral hub protein, and a broader role was uncovered by integrating PPI networks from alpha- and betaherpesvirus infections. Altogether, Tapioca provides a web-accessible platform for predicting PPIs in dynamic contexts.

Show Abstract

Nonlinear spiked covariance matrices and signal propagation in deep neural networks

Zhichao Wang, D. Wu, Zhou Fan

Many recent works have studied the eigenvalue spectrum of the Conjugate Kernel (CK) def ined by the nonlinear feature map of a feedforward neural network. However, existing results only establish weak convergence of the empirical eigenvalue distribution, and fall short of providing precise quantitative characterizations of the “spike” eigenvalues and eigenvectors that often capture the low-dimensional signal structure of the learning problem. In this work, we characterize these signal eigenvalues and eigenvectors for a nonlinear version of the spiked covariance model, including the CK as a special case. Using this general result, we give a quantitative description of how spiked eigenstructure in the input data propagates through the hidden layers of a neural network with random weights. As a second application, we study a simple regime of representation learning where the weight matrix develops a rank-one signal component over training and characterize the alignment of the target function with the spike eigenvector of the CK on test data.

Show Abstract

To be or not to be: orb, the fusome and oocyte specification in Drosophila

In the fruit fly Drosophila melanogaster, two cells in a cyst of 16 interconnected cells have the potential to become the oocyte, but only one of these will assume an oocyte fate as the cysts transition through regions 2a and 2b of the germarium. The mechanism of specification depends on a polarized microtubule network, a dynein dependent Egl:BicD mRNA cargo complex, a special membranous structure called the fusome and its associated proteins, and the translational regulator orb. In this work, we have investigated the role of orb and the fusome in oocyte specification. We show here that specification is a stepwise process. Initially, orb mRNAs accumulate in the two pro-oocytes in close association with the fusome. This association is accompanied by the activation of the orb autoregulatory loop, generating high levels of Orb. Subsequently, orb mRNAs become enriched in only one of the pro-oocytes, the presumptive oocyte, and this is followed, with a delay, by Orb localization to the oocyte. We find that fusome association of orb mRNAs is essential for oocyte specification in the germarium, is mediated by the orb 3′ UTR, and requires Orb protein. We also show that the microtubule minus end binding protein Patronin functions downstream of orb in oocyte specification. Finally, in contrast to a previously proposed model for oocyte selection, we find that the choice of which pro-oocyte becomes the oocyte does not seem to be predetermined by the amount of fusome material in these two cells, but instead depends upon a competition for orb gene products.

Show Abstract
February 12, 2024

For how many iterations should we run Markov chain Monte Carlo?

C. Margossian, Andrew Gelman

Standard Markov chain Monte Carlo (MCMC) admits three fundamental control parameters: the number of chains, the length of the warmup phase, and the length of the sampling phase. These control parameters play a large role in determining the amount of computation we deploy. In practice, we need to walk a line between achieving sufficient precision and not wasting precious computational resources and time. We review general strategies to check the length of the warmup and sampling phases, and examine the three control parameters of MCMC in the contexts of CPU- and GPU-based hardware. Our discussion centers around three tasks: (1) inference about a latent variable, (2) computation of expectation values and quantiles, and (3) diagnostics to check the reliability of the estimators. This chapter begins with general recommendations on the control parameters of MCMC, which have been battle-tested over the years and often motivate defaults in Bayesian statistical software. Usually we do not know ahead of time how a sampler will interact with a target distribution, and so the choice of MCMC algorithm and its control parameters, tend to be based on experience, re-evaluated after simulations have been obtained and analyzed. The second part of this chapter provides a theoretical motivation for our recommended approach, with pointers to some concerns and open problems. We also examine recent developments on the algorithmic and hardware fronts, which motivate new computational approaches to MCMC.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates

privacy consent banner

Privacy preference

We use cookies to provide you with the best online experience. By clicking "Accept All," you help us understand how our site is used and enhance its performance. You can change your choice at any time here. To learn more, please visit our Privacy Policy.