Presenter: Ren Yi, Ph.D. Candidate, New York University
Topic: Methods to Improve Knowledge Transfer Efficiency for Data-limited Problems in Genomics
Abstract: The recent advancement in computational genomics has largely benefited from the explosion of high-throughput genomic data, as well as an equal growth in biological databases. However, as more sequencing technologies become available and large genomic consortiums start to crowdsource data from larger cohorts of research groups, data heterogeneity has become an increasingly prominent issue. Data integration across multiple data sources and data modalities becomes particularly important for a greater number of biological systems. High-throughput omics data are typically highly-skewed towards a small number of model organisms, factors, and conditions with which wet-lab experiments have higher success rates. It further introduces technical challenges when building machine learning models for problems with limited data. This thesis describes methods that improve knowledge transfer efficiency for learning data-limited problems through efficient task-specific feature representation in the multitask learning setting. We demonstrate the performance of our methods in two genomic problems — genetic variant calling and cell type-specific transcription factor binding predictions.