Speaker: Zhicheng Pan (University of California – Los Angeles)
Topic: Machine Learning Strategies for Alternative Splicing
Alternative splicing (AS) is a fundamental biological process that diversifies the transcriptomes and proteomes. Aberrant splicing is the main cause of rare diseases and cancers. Advances in next-generation sequencing have accelerated the discoveries of AS and the accumulations of large-scale RNA-seq datasets across diverse biological states. However, utilizing the large number of datasets to make biological discoveries remains a challenge. We developed machine-learning-based methods to interrogate various types of RNA-seq datasets and transform them into biological knowledge domains that can facilitate discoveries towards regulatory mechanisms and functional consequences of AS.
In the first part of the presentation, I will present SIRI, a computational workflow for quantifying unspliced introns during cell development. Steps of mRNA maturation occur in distinct cellular locations, while the subcellular distribution of processed and unprocessed transcripts is often missed in transcriptomic analyses. SIRI was used to track mRNA maturation by measuring intron levels across subcellular locations in mouse embryonic stem cells, neuronal progenitor cells, and postmitotic neurons. We identified four intron groups that have disparate patterns of RNA enrichment across subcellular locations. Through a deep-learning-based computational framework, we identified a set of triplet motifs and sequence conservation patterns that are predictive of intron behavior across cell development.
In the second part of the presentation, I will introduce a general deep-learning-based framework, iDARTS, that can accurately predict splicing levels and variants’ effects on splicing across human tissues and cell types. Identifying genomic variants that causally impact AS has been a long-standing challenge. We trained iDARTS using population-scale genome and RNA-seq data of 8,304 samples across 53 tissues from the GTEx project. We demonstrate that iDARTS can help to identify the causal effects of common and rare genomic variants on splicing through computational and experimental analyses. Moreover, by analyzing ~10 million intronic and exonic variants, we characterized functional relevant splice-altering variants that are related to cancers and diseases. This enables ab initio prediction of disease-implicated variants, thereby helping to interpret variants of uncertain significance in clinical studies.