Alexandr Andoni, Ph.D.Columbia University
Interdisciplinary lectures are open to the public and are held at the Gerald D. Fischbach Auditorium at the Simons Foundation headquarters in New York City. Tea is served prior to each lecture.
How does one efficiently search for similar items in a large dataset, say, of images? This is a classic algorithmic task that arises when dealing with massive datasets. Its applications span many areas, including information retrieval, machine learning, data mining, computer vision, signal processing, bioinformatics, etc. For example, this task underlies the classic ‘nearest neighbor’ classification rule in machine learning. The natural solution for similarity search — to scan the entire dataset, comparing a given item to each item in the dataset — is prohibitively slow for modern datasets.
Alexandr Andoni will describe how efficient solutions for similarity search benefit from the tools and perspectives of high-dimensional geometry. The latter emerged as the natural language of similarity search: e.g., a 20 x 20 pixel image naturally corresponds to a 400-dimensional vector, one coordinate per pixel. Andoni will show how geometric tools such as dimension reduction and randomized space partitions enable efficient similarity search algorithms.