|Tengyu Ma||Princeton University|
|Jelani Nelson||Harvard University|
|Irit Dinur||Weizmann Institute of Science|
|Alex Andoni||Columbia University|
Better understanding of non-convex methods in machine learning
Recent breakthroughs in machine learning, especially deep learning, often involve learning complex and high-dimensional models on massive datasets with non-convex optimization. Though empirically successful, the formal study of such non-convex methods is much less developed.
The talk will showcase a few results that aim to develop new algorithmic approaches and analysis tools in these settings. First, we show that matrix completion — a famous problem in machine learning — can be solved by stochastic gradient descent on the straightforward non-convex objective function in polynomial time. (Formally, we show that all local minima of the objective are also global minima.) Then, we will analyze the landscape of the objective functions for linearized residual nets, and demonstrate that over-parameterization and re-parameterization of the models can make the optimization easier.
Based on joint works with Rong Ge, Jason D. Lee, and Moritz Hardt. No prior knowledge on machine learning is assumed.
Optimality of the Johnson-Lindenstrauss lemma
Dimensionality reduction in Euclidean space, as attainable by the Johnson-Lindenstrauss lemma (also known as “random projections”), has been a fundamental tool in algorithm design and machine learning. The JL lemma states that any n points in Euclidean space can be mapped to m-dimensional Euclidean space while preserving all pairwise distances up to 1+epsilon, where m only needs to be on the order of (log n) /epsilon^2, independent of the original dimension. In this talk, I discuss our recent proof that the JL lemma is optimal, in the sense that for any n there are point sets of size n such that no embedding providing (1+epsilon)-distortion exists into a dimension that is more than a constant factor better than what the JL lemma guarantees. I will also discuss some subsequent work and future directions.
Joint work with Kasper Green Larsen (Aarhus University).
Gradient Descent Learns Linear Dynamical Systems
We prove that gradient descent efficiently converges to the global optimizer of the maximum likelihood objective of an unknown linear time-invariant dynamical system from a sequence of noisy observations generated by the system. Even though objective function is non-convex, we provide polynomial running time and sample complexity bounds under strong but natural assumptions. Linear systems identification has been studied for many decades, yet, to the best of our knowledge, these are the first polynomial guarantees for the problem we consider.
Joint work with Tengyu Ma and Ben Recht.
New York University
On Phase Transitions for Spiked Random Matrix and Tensor Models
A central problem of random matrix theory is to understand the eigenvalues of spiked random matrix models, in which a prominent eigenvector (or low rank structure) is planted into a random matrix. These distributions form natural statistical models for principal component analysis (PCA) problems throughout the sciences, where the goal is often to recover or detect the planted low rank structured. In this talk we discuss fundamental limitations of statistical methods to perform these tasks and methods that outperform PCA at it. Emphasis will be given to low rank structures arising in Synchronization problems. Time permitting, analogous results for spiked tensor models will also be discussed.
Joint work with Amelia Perry, Alex Wein, and Ankur Moitra.
Weizmann Institute of Science
Agreement testing and PCPs
I will describe the notion of agreement testing, which allows to deduce global structure from local agreement checks. In retrospect, agreement testing theorems are a key combinatorial component in nearly all PCPs.
Recently, we have discovered a connection between agreement tests and high dimensional expanders. Namely, every high dimensional expander gives rise to an agreement theorem. I will describe (some properties of) the fascinating bounded degree high dimensional expanders that were constructed by Lubotzky, Samuels, and Vishne. Since these objects are sparse, they imply a strong derandomization of direct product and direct sum testing.
Based on joint work with Tali Kaufman.
The Geometry of Similarity Search
How does one efficiently search for similar items in a large dataset, say, of images? This is a classic algorithmic task that arises when dealing with massive datasets. Its applications span many areas, including information retrieval, machine learning, data mining, computer vision, signal processing, bioinformatics, etc. For example, this task underlies the classic ‘nearest neighbor’ classification rule in machine learning. The natural solution for similarity search — to scan the entire dataset, comparing a given item to each item in the dataset — is prohibitively slow for modern datasets.
Andoni will describe how efficient solutions for similarity search benefit from the tools and perspectives of high-dimensional geometry. The latter emerged as the natural language of similarity search: e.g., a 20 x 20 pixel image naturally corresponds to a 400-dimensional vector, one coordinate per pixel. Andoni will show how geometric tools such as dimension reduction and randomized space partitions enable efficient similarity search algorithms.