Simons Collaboration on Algorithms & Geometry 2018 Annual Meeting
 Organized by

Mark Braverman, Ph.D.Princeton University

Amit Singer, Ph.D.Princeton University
This is the fourth annual meeting of the Simons Collaboration on Algorithms & Geometry.
Talks
Stephane Mallat
Unsupervised Learning from Max Entropy to Deep Generative Networks
Generative convolutional networks have obtained spectacular results to synthesize complex signals such as images, speech, music, with barely any mathematical understanding. This lecture will move towards this world by beginning from well relatively understood maximum entropy modelization. We first show that nonGaussian and nonMarkovian stationary processes requires to separate scales and measure scale interactions, which can be done with a deep neural network. Applications to turbulence models in physics and cosmology will be shown.
We shall review deep Generative networks such as GAN and Variational Encoders, which can synthesize realizations of nonstationary processes or highly complex processes such as speech or music. We show that they can be considerably simplified by defining the estimation as an inverse problem. This will build a bridge with maximum entropy estimation. Applications will be shown on images, speech and music generation.
Rachel Ward
Learning the learning rate in gradient descent methods
Finding a proper learning rate in stochastic optimization is an important problem. Choosing a learning rate that is too small leads to painfully slow convergence, while a learning rate that is too large can cause the loss function to fluctuate around the minimum or even to diverge. In practice, the learning rate is often tuned by hand for different problems at hand. Several methods have been proposed recently for automatic adjustment of the learning rate according to gradient data that is received along the way. We review these methods, and propose a simple method, inspired by reparametrization of the loss function in polar coordinates. We prove that the proposed method achieves optimal oracle convergence rates in batch and stochastic settings, but without having to know certain parameters of the loss function in advance.
This is joint work with Xiaoxia Wu and Leon Bottou
Konstantin Tikhomirov
The circular law for very sparse random matrices
Consider an n by n random matrix with i.i.d Bernoulli entries, where each entry is equal one with probability p. In the regime when p is at least logarithmic in dimension n, the circular law for the limiting spectral distribution (of appropriately rescaled matrix) as the dimension grows to infinity, was previously established in works of TaoVu, GotzeTikhomirov, BasakRudelson. In our work, we consider the setting allowing p to be sublogarithmic; in particular, this implies that the random matrix is singular with high probability. With some restrictions on p, we are able to establish the limiting law for the spectrum for this random model, as well as for some of its generalizations. This is joint work with Mark Rudelson.
Aleksandar Nikolov
Balancing Vectors in Any Norm
In the vector balancing problem, we are given symmetric convex bodies C and K in ndimensional space, and our goal is to determine the minimum number b, known as the vector balancing constant from C to K, such that for any sequence of vectors in C there always exists a signed combination of them lying inside bK. This quantity goes back to a question of Dvoretzky from 1963. Many fundamental results in discrepancy theory, such as the BeckFiala theorem (Discrete Applied Mathematics, Vol 3 Issue 1, 1981), Spencer’s “six standard deviations suffice” theorem (Transactions of the American Mathematical Society, Vol 289 Issue 2, 1985) and Banaszczyk’s vector balancing theorem (Random Structures & Algorithms, Vol 12 Issue 4, 1998) correspond to bounds on vector balancing constants.
The above theorems have inspired much research in recent years within theoretical computer science, from the development of efficient polynomial time algorithms that match existential vector balancing guarantees, to their applications in the context of approximation algorithms. In this work, we show that all vector balancing constants admit “good” approximate characterizations, with approximation factors depending only polylogarithmically on the dimension n. First, we show that a volumetric lower bound due to Banaszczyk is tight within a O(log n) factor. Second, we present a novel convex program which encodes the “best possible way” to apply Banaszczyk’s vector balancing theorem for bounding vector balancing constants from above, and show that it is tight within an O(log^{2.5} n) factor. This also directly yields a corresponding polynomial time approximation algorithm both for vector balancing constants, and for the hereditary discrepancy of any sequence of vectors with respect to an arbitrary norm. Our results yield the first guarantees which depend only polylogarithmically on the dimension of the norm ball K.
Benjamin Recht
The statistical foundations of learning to control
Given the dramatic successes in machine learning over the past halfdecade, there has been a resurgence of interest in applying learning techniques to continuous control problems in robotics, selfdriving cars, and unmanned aerial vehicles. Though such applications appear to be straightforward generalizations of what is known as reinforcement learning, few fundamental baselines have been established prescribing how well one must know a system in order to control it. In this talk, I will discuss how one might merge techniques from statistical learning theory with robust control to derive baselines for such continuous control. I will further describe how these simple baselines give us insights into shortcomings of existing reinforcement learning methodology. I will close by listing several exciting open problems that must be solved before we can build robust, safe learning systems that interact with an uncertain physical environment.
Subhash Khot
Hardness of Approximation: From the PCP Theorem to the 2to2 Games Theorem
Researchers firmly believe that no algorithm can efficiently compute optimal solutions to computationally complex problems called NPhard problems. For many NPhard problems, even computing an approximatelyoptimal solution is NPhard. This phenomenon, known as the hardness of approximation, has numerous connections to proof checking, optimization, combinatorics, discrete Fourier analysis, and geometry.
In this lecture, Subhash Khot will provide an overview of those connections. He will address why graph coloring is a computationally hard problem, how it is possible to check a proof without even looking at it, why computational scientists love the majority vote, and whether a shape exists that looks spherical as well as cubical. He will explain how all this fits into a 30year research program starting with the foundational probabilistically checkable proofs (PCP) theorem and leading to the recent 2to2 games theorem.
For further information on Professor Khot’s lecture, see its event page.