Title: Tractability, generalization and overparametrization in linear and nonlinear models
Abstract: Empirical risk minimization (ERM) is the dominant paradigm in statistical learning. Optimizing the empirical risk of neural networks is a highly non-convex optimization problem but, despite this, it is routinely solver to optimality or near optimality using first order methods such has stochastic gradient descent. It has recently been argued that overparametrization plays a key role in explaining this puzzle: overparametrized models are simple to optimize, achieving vanishing or nearly vanishing training error. Surprisingly, the overparametrized models learnt by gradient-based method appear to have good generalization properties. I will present recent results on these phenomena booth in linear models that are directly motivated by the analysis of two-layer neural networks, and in some simple nonlinear models.
[Based on joint work with Yiqiao Zhong and Kangjie Zhou]