Simons Collaboration on Algorithms & Geometry 2017 Annual Meeting
Speakers
Tengyu Ma  Princeton University 
Jelani Nelson  Harvard University 
Moritz Hardt  
Afonso Bandiera  New York University 
Irit Dinur  Weizmann Institute of Science 
Alex Andoni  Columbia University 

Meeting Report
As in previous years, the A&G annual meeting started with conferencestyle talks on recent important progress and ended with a public talk by an A&G member. Irit Dinur started the day with a talk that presented an interesting connection between the agreementtesting problem that occurs in complexity theory (mainly in probabilistically checkable proofs) and the expansion of simplicial complexes. Here, the notion of expansion has more than one possible interpretation, and Dinur formulated a notion that is suitable for the agreementtesting problem. This notion holds for simplicial complexes that arise from the Ramanujan complexes of Lubotzky, Samuels and Vishne, thus leading to new strong derandomization results.
Tengyu Ma presented rigorous results that explain the empirically observed success of nonconvex optimization problems on typical instances that arise from certain probability distributions on inputs. He demonstrated this by showing that for certain problems and certain distributions with high probability, every local minimum is also a global minimum. Such issues are ubiquitous in learning theory, and this result is a step toward the explanation of the success of currently used heuristic methods. Jelani Nelson presented his recent solution of the longstanding question of the optimality of the JohnsonLindenstrauss dimensionality reduction lemma. This 1983 lemma, which is a powerful tool in many applications, asserts that for every \(\epsilon>0\) and \(n\) vectors \(x_1,…,x_n\in R^n\), there exists \(k= O(\epsilon^{2}log\ n)\) and vectors \(y_1,..,y_n\in R^k\) such that $$(1 – \epsilon)x_ix_j \leq y_iy_j \leq (1 + \epsilon)x_ix_j$$ for all \(i,j\). The dependence of the reduced dimension \(k\) on the number of vectors \(n\) is easily seen to be sharp, but it was unknown whether the dependence on \(\epsilon\) as \(\epsilon\rightarrow0\) is sharp. Nelson proved that this dependence on \(\epsilon\) is sharp for all values \(\epsilon\) that are not very small (namely, not smaller than a power of \(1/n\)). This is based on a new and clever coding scheme.
Moritz Hardt presented interesting work about learning an unknown, linear, timeinvariant dynamical system from a sequence of noisy observations generated by the system. This work relates the question to certain issues in complex analysis. These results are a thoughtprovoking first step toward obtaining a rigorous theory for linear systems identification problems that have been investigated for a long time using heuristic methods. The final talk of the nonpublic part of the A&G annual meeting was by Afonso Bandeira, who discussed questions on random matrix models in which a lowrank matrix is added to a random matrix. The question is how large should the lowrank perturbation be in order for it to be detectable statistically. Such questions arise in natural statistical models for principal component analysis and Bandeira’s talk demonstrated fundamental limitations of statistical methods to perform these tasks.
The day ended with a public talk of Alex Andoni on nearestneighbor search, which is a very important algorithmic task with a wide range of applications in many areas. Such questions are being investigated at A&G intensively (including very recent and ongoing progress), but this talk was more focused on an introduction to this topic for nonspecialists. The talk was attended by many people, and there was a lot of audience interest and participation. I don’t know the vast majority of those who came (you have the list of registrants, so you may know more), but based on the questions that were asked throughout the talk, it seems that they were people who are interested in such topics either from industry or academia. The overall theme of Andoni’s talk was how to build data structures that retrieve approximate nearestneighbor queries in sublinear time. The methods here are geometric, and quite remarkably, they yield efficient procedures that work irrespective of the input point set. Andoni presented the fundamental questions in this context, as well as some of the fundamental techniques (including his recent work on datadependent localitysensitive hashing).
Talks
Tengyu Ma
Princeton University
Better understanding of nonconvex methods in machine learning
Recent breakthroughs in machine learning, especially deep learning, often involve learning complex and highdimensional models on massive datasets with nonconvex optimization. Though empirically successful, the formal study of such nonconvex methods is much less developed.
The talk will showcase a few results that aim to develop new algorithmic approaches and analysis tools in these settings. First, we show that matrix completion — a famous problem in machine learning — can be solved by stochastic gradient descent on the straightforward nonconvex objective function in polynomial time. (Formally, we show that all local minima of the objective are also global minima.) Then, we will analyze the landscape of the objective functions for linearized residual nets, and demonstrate that overparameterization and reparameterization of the models can make the optimization easier.
Based on joint works with Rong Ge, Jason D. Lee, and Moritz Hardt. No prior knowledge on machine learning is assumed.
Jelani Nelson
Harvard University
Optimality of the JohnsonLindenstrauss lemma
Dimensionality reduction in Euclidean space, as attainable by the JohnsonLindenstrauss lemma (also known as “random projections”), has been a fundamental tool in algorithm design and machine learning. The JL lemma states that any n points in Euclidean space can be mapped to mdimensional Euclidean space while preserving all pairwise distances up to 1+ε, where m only needs to be on the order of (log n) /ε ^{2}, independent of the original dimension. In this talk, I discuss our recent proof that the JL lemma is optimal, in the sense that for any n there are point sets of size n such that no embedding providing (1+ε)distortion exists into a dimension that is more than a constant factor better than what the JL lemma guarantees. I will also discuss some subsequent work and future directions.
Joint work with Kasper Green Larsen (Aarhus University).
Moritz Hardt
Google
Gradient Descent Learns Linear Dynamical Systems
We prove that gradient descent efficiently converges to the global optimizer of the maximum likelihood objective of an unknown linear timeinvariant dynamical system from a sequence of noisy observations generated by the system. Even though objective function is nonconvex, we provide polynomial running time and sample complexity bounds under strong but natural assumptions. Linear systems identification has been studied for many decades, yet, to the best of our knowledge, these are the first polynomial guarantees for the problem we consider.
Joint work with Tengyu Ma and Ben Recht.
Afonso Bandiera
New York University
On Phase Transitions for Spiked Random Matrix and Tensor Models
A central problem of random matrix theory is to understand the eigenvalues of spiked random matrix models, in which a prominent eigenvector (or low rank structure) is planted into a random matrix. These distributions form natural statistical models for principal component analysis (PCA) problems throughout the sciences, where the goal is often to recover or detect the planted low rank structured. In this talk we discuss fundamental limitations of statistical methods to perform these tasks and methods that outperform PCA at it. Emphasis will be given to low rank structures arising in Synchronization problems. Time permitting, analogous results for spiked tensor models will also be discussed.
Joint work with Amelia Perry, Alex Wein, and Ankur Moitra.
Irit Dinur
Weizmann Institute of Science
High dimensional expanders imply agreement tests
I will describe the notion of agreement testing, which allows to deduce global structure from local agreement checks. In retrospect, agreement testing theorems are a key combinatorial component in nearly all PCPs.
Recently, we have discovered a connection between agreement tests and high dimensional expanders. Namely, every high dimensional expander gives rise to an agreement theorem. I will describe (some properties of) the fascinating bounded degree high dimensional expanders that were constructed by Lubotzky, Samuels, and Vishne. Since these objects are sparse, they imply a strong derandomization of direct product and direct sum testing.
Based on joint work with Tali Kaufman.
Alex Andoni
Columbia University
The Geometry of Similarity Search
How does one efficiently search for similar items in a large dataset, say, of images? This is a classic algorithmic task that arises when dealing with massive datasets. Its applications span many areas, including information retrieval, machine learning, data mining, computer vision, signal processing, bioinformatics, etc. For example, this task underlies the classic ‘nearest neighbor’ classification rule in machine learning. The natural solution for similarity search — to scan the entire dataset, comparing a given item to each item in the dataset — is prohibitively slow for modern datasets.
Andoni will describe how efficient solutions for similarity search benefit from the tools and perspectives of highdimensional geometry. The latter emerged as the natural language of similarity search: e.g., a 20 x 20 pixel image naturally corresponds to a 400dimensional vector, one coordinate per pixel. Andoni will show how geometric tools such as dimension reduction and randomized space partitions enable efficient similarity search algorithms.