Simons Collaboration on Algorithms & Geometry 2017 Annual Meeting

Date & Time


Tengyu Ma Princeton University
Jelani Nelson Harvard University
Moritz Hardt Google
Irit Dinur Weizmann Institute of Science
Alex Andoni Columbia University


Tengyu Ma
Princeton University

Better understanding of non-convex methods in machine learning

Recent breakthroughs in machine learning, especially deep learning, often involve learning complex and high-dimensional models on massive datasets with non-convex optimization. Though empirically successful, the formal study of such non-convex methods is much less developed.

The talk will showcase a few results that aim to develop new algorithmic approaches and analysis tools in these settings. First, we show that matrix completion — a famous problem in machine learning — can be solved by stochastic gradient descent on the straightforward non-convex objective function in polynomial time. (Formally, we show that all local minima of the objective are also global minima.) Then, we will analyze the landscape of the objective functions for linearized residual nets, and demonstrate that over-parameterization and re-parameterization of the models can make the optimization easier.

Based on joint works with Rong Ge, Jason D. Lee, and Moritz Hardt. No prior knowledge on machine learning is assumed.


Jelani Nelson
Harvard University

Optimality of the Johnson-Lindenstrauss lemma

Dimensionality reduction in Euclidean space, as attainable by the Johnson-Lindenstrauss lemma (also known as “random projections”), has been a fundamental tool in algorithm design and machine learning. The JL lemma states that any n points in Euclidean space can be mapped to m-dimensional Euclidean space while preserving all pairwise distances up to 1+epsilon, where m only needs to be on the order of (log n) /epsilon^2, independent of the original dimension. In this talk, I discuss our recent proof that the JL lemma is optimal, in the sense that for any n there are point sets of size n such that no embedding providing (1+epsilon)-distortion exists into a dimension that is more than a constant factor better than what the JL lemma guarantees. I will also discuss some subsequent work and future directions.

Joint work with Kasper Green Larsen (Aarhus University).


Moritz Hardt

Gradient Descent Learns Linear Dynamical Systems

We prove that gradient descent efficiently converges to the global optimizer of the maximum likelihood objective of an unknown linear time-invariant dynamical system from a sequence of noisy observations generated by the system. Even though objective function is non-convex, we provide polynomial running time and sample complexity bounds under strong but natural assumptions. Linear systems identification has been studied for many decades, yet, to the best of our knowledge, these are the first polynomial guarantees for the problem we consider.

Joint work with Tengyu Ma and Ben Recht.


Afonso Bandiera
New York University

On Phase Transitions for Spiked Random Matrix and Tensor Models
A central problem of random matrix theory is to understand the eigenvalues of spiked random matrix models, in which a prominent eigenvector (or low rank structure) is planted into a random matrix. These distributions form natural statistical models for principal component analysis (PCA) problems throughout the sciences, where the goal is often to recover or detect the planted low rank structured. In this talk we discuss fundamental limitations of statistical methods to perform these tasks and methods that outperform PCA at it. Emphasis will be given to low rank structures arising in Synchronization problems. Time permitting, analogous results for spiked tensor models will also be discussed.

Joint work with Amelia Perry, Alex Wein, and Ankur Moitra.


Irit Dinur
Weizmann Institute of Science

Agreement testing and PCPs

I will describe the notion of agreement testing, which allows to deduce global structure from local agreement checks. In retrospect, agreement testing theorems are a key combinatorial component in nearly all PCPs.

Recently, we have discovered a connection between agreement tests and high dimensional expanders. Namely, every high dimensional expander gives rise to an agreement theorem. I will describe (some properties of) the fascinating bounded degree high dimensional expanders that were constructed by Lubotzky, Samuels, and Vishne. Since these objects are sparse, they imply a strong derandomization of direct product and direct sum testing.

Based on joint work with Tali Kaufman.


Alex Andoni
Columbia University

The Geometry of Similarity Search

How does one efficiently search for similar items in a large dataset, say, of images? This is a classic algorithmic task that arises when dealing with massive datasets. Its applications span many areas, including information retrieval, machine learning, data mining, computer vision, signal processing, bioinformatics, etc. For example, this task underlies the classic ‘nearest neighbor’ classification rule in machine learning. The natural solution for similarity search — to scan the entire dataset, comparing a given item to each item in the dataset — is prohibitively slow for modern datasets.

Andoni will describe how efficient solutions for similarity search benefit from the tools and perspectives of high-dimensional geometry. The latter emerged as the natural language of similarity search: e.g., a 20 x 20 pixel image naturally corresponds to a 400-dimensional vector, one coordinate per pixel. Andoni will show how geometric tools such as dimension reduction and randomized space partitions enable efficient similarity search algorithms.

Subscribe to MPS announcements and other foundation updates