Title: Some naive open problems about optimization in high dimensions: how much data is needed to tame topological complexity and entropy with simple algorithms?
Abstract: I will discuss some open problems around the performance of gradient descent (and its variants) for empirical loss minimization in high dimensional settings. What is the role of the topological complexity of the random landscape defined by the empirical loss? What is the role of the entropy near the random initialization, how fast does the dynamics find the informative region, i.e how hard is it to get better than random? What is the role of the complexity in the non informative region? in the informative one? What is the role of the structure of the signal?
All these questions are way too hard (at least for me) in serious models of Machine Learning. But I will try to discuss them in simpler statistical contexts in order to introduce some of the needed theoretical tools, both for the topological complexity questions and for the understanding of the dynamics. These tools are progressing slowly, at least in the Math community as they are moving faster in Physics. They still need to be considerably expanded and strengthened to end up being useful in harder models.
I will base this talk on joint works with P. Bourgade, G. Biroli, C. Cammarota, Y. Fyodorov, R. Gheissari, A.Jagannath, B. Khorunzenko, A. Maillard, B. McKenna, S. Mei, A. Montanari, M. Nica, V. Ros, A. Wein, I.Zadik,