Presenter: Aaron Watters, Ph.D.,
Senior Software Engineer, Flatiron Institute
Title: Landscapes of Binary Classification Ranking
Abstract: This talk informally explores and compares methods for evaluating binary classification methods
and how different evaluation metrics are better for different purposes. The goal is to
understand what sort of rankings are preferred by metrics such as the area under the receiver/operator
curve, the area under the precision/recall curve, F1 score based metrics, and
two metrics that as far as I know are new: average true position and median true position.
We will explore the notion of stability under missing true values for a ranking and how stable
each metric is when some unknown true values are labeled as false in the “gold standard”. Discussions will be supported by interactive graphical visualizations to help build intuition.