SMBp Seminar: Paul Kim

Date & Time


Speaker: Paul Kim, Software Engineer, Simons Machine Learning Center

Title: Learning to automate Cryo-EM data collection with Ptolemy

Abstract: Over the past decade, cryogenic electron microscopy (cryo-EM) has emerged as a primary method for determining near-native, near-atomic resolution 3D structures of biological macromolecules. In order to meet increasing demand for cryo-EM, automated methods to improve throughput and efficiency while lowering costs are needed. Currently, the process of collecting high-magnification cryo-EM micrographs, data collection, requires human input and manual tuning of parameters, as expert operators must navigate low- and medium-magnification images to find good high-magnification collection locations. Automating this is non-trivial: the images suffer from low signal-to-noise ratio and are affected by a range of experimental parameters that can differ for each collection session. Here, we use various computer vision algorithms, including mixture models, convolutional neural networks (CNNs), and U-Nets to develop the first pipeline to automate low- and medium-magnification targeting with purpose-built algorithms. Learned models in this pipeline are trained on a large internal dataset of images from real world cryo-EM data collection sessions, labeled with locations that were selected by operators. Using these models, we show that we can effectively detect and classify regions of interest (ROIs) in low- and medium-magnification images, and can generalize to unseen sessions, as well as to images captured using different microscopes from external facilities. We expect our pipeline, Ptolemy, will be both immediately useful as a tool for automation of cryo-EM data collection, and serve as a foundation for future advanced methods for efficient and automated cryo-EM microscopy.

About the Speaker

Paul is a Software Engineer at the SMLC. Paul did his undergrad in Statistics with a minor in Chemistry at UC Berkeley. He then worked as an ML Researcher at Bayer’s Machine Learning Research team in Berlin, headed by Djork-Arné Clevert. He is taking a year off from the Carnegie Mellon Comp. Bio. MS program to work on computer vision and automation problems in Cryo-EM at the SMLC, and is broadly interested in the intersection of machine learning and biology, especially as it pertains to proteins and to drug discovery.

Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates