Simons Collaboration on Algorithms & Geometry 2017 Annual Meeting

Speakers
Tengyu Ma, B.S.
Alexandr Andoni, Ph.D.Columbia University

Friday May 19, 2017

Speakers

Tengyu Ma	Princeton University
Jelani Nelson	Harvard University
Moritz Hardt	Google
Afonso Bandiera	New York University
Irit Dinur	Weizmann Institute of Science
Alex Andoni	Columbia University

Meeting Report

As in previous years, the A&G annual meeting started with conference-style talks on recent important progress and ended with a public talk by an A&G member. Irit Dinur started the day with a talk that presented an interesting connection between the agreement-testing problem that occurs in complexity theory (mainly in probabilistically checkable proofs) and the expansion of simplicial complexes. Here, the notion of expansion has more than one possible interpretation, and Dinur formulated a notion that is suitable for the agreement-testing problem. This notion holds for simplicial complexes that arise from the Ramanujan complexes of Lubotzky, Samuels and Vishne, thus leading to new strong derandomization results.

Tengyu Ma presented rigorous results that explain the empirically observed success of nonconvex optimization problems on typical instances that arise from certain probability distributions on inputs. He demonstrated this by showing that for certain problems and certain distributions with high probability, every local minimum is also a global minimum. Such issues are ubiquitous in learning theory, and this result is a step toward the explanation of the success of currently used heuristic methods. Jelani Nelson presented his recent solution of the longstanding question of the optimality of the Johnson-Lindenstrauss dimensionality reduction lemma. This 1983 lemma, which is a powerful tool in many applications, asserts that for every $\epsilon>0$ and $n$ vectors $x_1,…,x_n\in R^n$, there exists $k= O(\epsilon^{-2}log\ n)$ and vectors $y_1,..,y_n\in R^k$ such that $$(1 – \epsilon)||x_i-x_j|| \leq ||y_i-y_j|| \leq (1 + \epsilon)||x_i-x_j||$$ for all $i,j$. The dependence of the reduced dimension $k$ on the number of vectors $n$ is easily seen to be sharp, but it was unknown whether the dependence on $\epsilon$ as $\epsilon\rightarrow0$ is sharp. Nelson proved that this dependence on $\epsilon$ is sharp for all values $\epsilon$ that are not very small (namely, not smaller than a power of $1/n$). This is based on a new and clever coding scheme.

Moritz Hardt presented interesting work about learning an unknown, linear, time-invariant dynamical system from a sequence of noisy observations generated by the system. This work relates the question to certain issues in complex analysis. These results are a thought-provoking first step toward obtaining a rigorous theory for linear systems identification problems that have been investigated for a long time using heuristic methods. The final talk of the nonpublic part of the A&G annual meeting was by Afonso Bandeira, who discussed questions on random matrix models in which a low-rank matrix is added to a random matrix. The question is how large should the low-rank perturbation be in order for it to be detectable statistically. Such questions arise in natural statistical models for principal component analysis and Bandeira’s talk demonstrated fundamental limitations of statistical methods to perform these tasks.

The day ended with a public talk of Alex Andoni on nearest-neighbor search, which is a very important algorithmic task with a wide range of applications in many areas. Such questions are being investigated at A&G intensively (including very recent and ongoing progress), but this talk was more focused on an introduction to this topic for nonspecialists. The talk was attended by many people, and there was a lot of audience interest and participation. I don’t know the vast majority of those who came (you have the list of registrants, so you may know more), but based on the questions that were asked throughout the talk, it seems that they were people who are interested in such topics either from industry or academia. The overall theme of Andoni’s talk was how to build data structures that retrieve approximate nearest-neighbor queries in sublinear time. The methods here are geometric, and quite remarkably, they yield efficient procedures that work irrespective of the input point set. Andoni presented the fundamental questions in this context, as well as some of the fundamental techniques (including his recent work on data-dependent locality-sensitive hashing).

Talks

Tengyu Ma
Princeton University

Better understanding of non-convex methods in machine learning

View Slides (PDF)

Recent breakthroughs in machine learning, especially deep learning, often involve learning complex and high-dimensional models on massive datasets with non-convex optimization. Though empirically successful, the formal study of such non-convex methods is much less developed.

The talk will showcase a few results that aim to develop new algorithmic approaches and analysis tools in these settings. First, we show that matrix completion — a famous problem in machine learning — can be solved by stochastic gradient descent on the straightforward non-convex objective function in polynomial time. (Formally, we show that all local minima of the objective are also global minima.) Then, we will analyze the landscape of the objective functions for linearized residual nets, and demonstrate that over-parameterization and re-parameterization of the models can make the optimization easier.

Based on joint works with Rong Ge, Jason D. Lee, and Moritz Hardt. No prior knowledge on machine learning is assumed.

Jelani Nelson
Harvard University

Optimality of the Johnson-Lindenstrauss lemma

View Slides (PDF)

Dimensionality reduction in Euclidean space, as attainable by the Johnson-Lindenstrauss lemma (also known as “random projections”), has been a fundamental tool in algorithm design and machine learning. The JL lemma states that any n points in Euclidean space can be mapped to m-dimensional Euclidean space while preserving all pairwise distances up to 1+ε, where m only needs to be on the order of (log n) /ε ², independent of the original dimension. In this talk, I discuss our recent proof that the JL lemma is optimal, in the sense that for any n there are point sets of size n such that no embedding providing (1+ε)-distortion exists into a dimension that is more than a constant factor better than what the JL lemma guarantees. I will also discuss some subsequent work and future directions.

Joint work with Kasper Green Larsen (Aarhus University).

Moritz Hardt
Google

Gradient Descent Learns Linear Dynamical Systems

We prove that gradient descent efficiently converges to the global optimizer of the maximum likelihood objective of an unknown linear time-invariant dynamical system from a sequence of noisy observations generated by the system. Even though objective function is non-convex, we provide polynomial running time and sample complexity bounds under strong but natural assumptions. Linear systems identification has been studied for many decades, yet, to the best of our knowledge, these are the first polynomial guarantees for the problem we consider.

Joint work with Tengyu Ma and Ben Recht.

Afonso Bandiera
New York University

On Phase Transitions for Spiked Random Matrix and Tensor Models

View Slides (PDF)

A central problem of random matrix theory is to understand the eigenvalues of spiked random matrix models, in which a prominent eigenvector (or low rank structure) is planted into a random matrix. These distributions form natural statistical models for principal component analysis (PCA) problems throughout the sciences, where the goal is often to recover or detect the planted low rank structured. In this talk we discuss fundamental limitations of statistical methods to perform these tasks and methods that outperform PCA at it. Emphasis will be given to low rank structures arising in Synchronization problems. Time permitting, analogous results for spiked tensor models will also be discussed.

Joint work with Amelia Perry, Alex Wein, and Ankur Moitra.

Irit Dinur
Weizmann Institute of Science

High dimensional expanders imply agreement tests

I will describe the notion of agreement testing, which allows to deduce global structure from local agreement checks. In retrospect, agreement testing theorems are a key combinatorial component in nearly all PCPs.

Recently, we have discovered a connection between agreement tests and high dimensional expanders. Namely, every high dimensional expander gives rise to an agreement theorem. I will describe (some properties of) the fascinating bounded degree high dimensional expanders that were constructed by Lubotzky, Samuels, and Vishne. Since these objects are sparse, they imply a strong derandomization of direct product and direct sum testing.

Based on joint work with Tali Kaufman.

Alex Andoni
Columbia University

The Geometry of Similarity Search

How does one efficiently search for similar items in a large dataset, say, of images? This is a classic algorithmic task that arises when dealing with massive datasets. Its applications span many areas, including information retrieval, machine learning, data mining, computer vision, signal processing, bioinformatics, etc. For example, this task underlies the classic ‘nearest neighbor’ classification rule in machine learning. The natural solution for similarity search — to scan the entire dataset, comparing a given item to each item in the dataset — is prohibitively slow for modern datasets.

Andoni will describe how efficient solutions for similarity search benefit from the tools and perspectives of high-dimensional geometry. The latter emerged as the natural language of similarity search: e.g., a 20 x 20 pixel image naturally corresponds to a 400-dimensional vector, one coordinate per pixel. Andoni will show how geometric tools such as dimension reduction and randomized space partitions enable efficient similarity search algorithms.