Title: Trajectory, Trajectory, Trajectory
Abstract: As we begin to “open the black box” of deep learning, it becomes clear that many important phenomena can only be understood via the trajectory taken by the training algorithm in parameter space. This talk covers three relevant works from our group.
The first gives a convergence result for training 2-layer nets (finite, and with non-lazy initialization) to globally max-margin solutions. The second explores the question “Why are convolutional nets more sample efficient than fully connected nets?” Surprisingly, this question did not have a good answer (to tbe best of our knowledge) and our paper shows a very strong result: concrete tasks that require O(1) training samples for a convolutional net but quadratic samples for every fully-connected nets trained with most common learning algorithms (an infinite class of nets!). The third part introduces many techniques and ideas to start understanding training trajectories for real-life nets. Since stochastic GD plays an important role in real-life nets, Stochastic Differential Equations (SDEs) play a key role.
(Paper 1: with Kaifeng Lyu, Runzhe Wang, Zhiyuan Li. Paper 2 (ICLR21): With Zhiyuan Li and Yi Zhang. Paper 3a: Zhiyuan Li and Sadhika Malladi. 3b (Neurips20): Zhiyuan Li and Kaifeng Lyu.)