2024 Mathematical and Scientific Foundations of Deep Learning Annual Meeting
Organizer:
Peter Bartlett, University of California, Berkley
René Vidal, University of Pennsylvania
Meeting Goals:
This meeting will bring together members of the NSF-Simons Research Collaborations on the Mathematical and Scientific Foundations of Deep Learning (MoDL) and researchers working on related topics. The focus of the meeting is the set of challenging theoretical questions posed by deep learning methods and the development of mathematical and statistical tools to understand their success and limitations, to guide the design of more effective methods, and to initiate the study of the mathematical problems that emerge. The meeting aims to report on progress in these topics and to stimulate discussions of future directions.
-
Agenda
Thursday, September 26
8:30 AM CHECK-IN & BREAKFAST 9:30 AM Rong Ge | What can linear transformers learn in context 10:30 AM BREAK / POSTER PRESENTATION 11:00 AM Elchanan Mossel | Why Depth? A theorectical perspectives on the advantages of depth in inference 12:00 PM LUNCH 1:00 PM Jeremias Sulam | Yay, my deep network works! But.. what did it learn? 2:00 PM BREAK / POSTER PRESENTATION 2:30 PM Bin Yu | Efficent fine-tuning of large deep learning models via infinite-width theory and experiments 3:30 PM BREAK / POSTER PRESENTATION 4:00 PM Nikolai Matni | What makes learning to control easy or hard? 5:00 PM DAY ONE CONCLUDES Friday, September 27
8:30 AM CHECK-IN & BREAKFAST 9:30 AM Jingfeng Wu | Reimaging Gradient Descent: Large Stepsize, Oscillation, and Acceleration 10:30 AM BREAK 11:00 AM Gitta Kutyniok | Reliable AI: From mathematical foundations to next generation AI computing 12:00 PM LUNCH 1:00 PM Misha Belkin | Emergence and Grokking in "Simple" Architectures 2:00 PM MEETING CONCLUDES -
Abstracts
Misha Belkin
University of California at San DiegoEmergence and Grokking in “Simple” Architectures
In recent years transformers have become a dominant machine learning methodology. A key element of transformer architectures is a standard neural network (MLP). I argue that MLPs alone already exhibit many remarkable behaviors observed in modern LLMs, including emergent phenomena. Furthermore, despite large amounts of work, we are still far from understanding how 2-layer MLPs learn relatively simple problems, such as “grokking” modular arithmetic. I will discuss recent progress and will argue that feature-learning kernel machines (Recursive Feature Machines) isolate some key computational aspects of modern neural architectures and are preferable to MLPs as a model for analysis of emergent phenomena as well as a powerful predictor in their own right.
Rong Ge
Duke UniversityWhat can linear transformers learn in context
Large language models exhibit strong in-context learning capabilities — their performance can improve given few in-context examples provided in the prompt. Recent research used linear regression as a simple setting to understand in-context learning. Results have demonstrated that transformers, particularly linear attention models, implicitly execute gradient-descent-like algorithms on data provided in context during their forward inference step. In this talk, we show that even linear transformers are very versatile and can go beyond simple gradient descent on more interesting data. In particular, we show how the same linear transformer can simultaneously handle regression problems with different noise level and how they can leverage a task descriptor.
Gitta Kutyniok
Ludwig Maximilian University of MunichReliable AI: From mathematical foundations to next generation AI computing
Artificial intelligence is currently leading to one breakthrough after the other, in industry, public life and the sciences. However, one current major drawback worldwide, in particular, in light of regulations such as the EU AI Act and the G7 Hiroshima AI Process, is the lack of reliability of such methodologies.
In this talk, we will first highlight the role of a mathematical perspective to this highly topical research direction and survey our recent advances concerning generalization bounds and reliable explainability approaches. We then discuss fundamental limitations in terms of computability, which affect AI’s reliability, and show solutions to this serious obstacle by revealing an intriguing connection to next generation AI computing, thereby also touching upon the enormous energy problem of current AI technology.
Nikolai Matni
University of PennsylvaniaWhat makes learning to control easy or hard?
Designing autonomous systems that are simultaneously high-performing, adaptive and provably safe remains an open problem. In this talk, we will argue that in order to meet this goal, new theoretical and algorithmic tools are needed that blend the stability, robustness and safety guarantees of robust control with the flexibility, adaptability and performance of machine and reinforcement learning. We will highlight our progress towards developing such a theoretical foundation of robust learning for safe control in the context of the following case studies: (i) characterizing fundamental limits of learning-enabled control, (ii) developing novel robust imitation learning algorithms with finite sample-complexity guarantees and, if time allows, (iii) leveraging data from diverse but related tasks for efficient multi-task learning for control. In all cases, we will emphasize the interplay between robust learning, robust control and robust stability and their consequences on the sample-complexity and generalizability of the resulting learning-based control algorithms.
Elchanan Mossel
Massachusetts Institute of TechnologyWhy depth? A theoretical perspective on the advantages of depth in inference
Can theory help explain the success of deep nets on real data? One avenue to explore this question is to ask if we can find
1. Natural data models where:
2. Inference is computationally and statistically efficient,
3. Inference requires depth (or some other measure of complexity) and
4. The inference procedure can be learned efficiently from data.As proving depth lower bounds in theoretical computer science for explicit objects is hard, perhaps the most difficult task is to establish 3. I will discuss some recent works that try to establish 1–4 for the broadcast model on the tree and where the inference procedure is belief propagation.
Based on:
- https://arxiv.org/pdf/2402.13359
- https://dl.acm.org/doi/abs/10.1145/3564246.3585155
- https://proceedings.neurips.cc/paper_files/paper/2022/hash/77e6814d32a86b76123bd10aa7e2ad81-Abstract-Conference.html
- https://proceedings.mlr.press/v125/moitra20a.html
- https://arxiv.org/pdf/1612.09057
Jeremias Sulam
Johns Hopkins UniversityYay, my deep network works! But… what did it learn?
Modern machine-learning methods are revolutionizing what we can do with data — from TikTok video recommendations to biomarkers discovery in cancer research. Yet, the complexity of these deep models makes it harder to understand what functions these data-dependent models are computing, and which features they detect and regard as important for a given task. In this talk, I will review two approaches for turning general deep-learning models more interpretable, both in an unsupervised setting in the context of imaging inverse problems — through learned proximal networks, as well as in supervised classification problems for computer vision — by testing for the semantic importance of concepts via betting.
Jingfeng Wu
University of California, BerkeleyReimaging gradient descent: Large stepsize oscillation and acceleration
Gradient descent (GD) and its variants are pivotal in machine learning, particularly deep learning. Conventional wisdom suggests smaller stepsizes for stability, yet in practice, larger stepsizes often yield faster convergence and improved generalization, despite initial instability. This talk delves into the dynamics of GD with a constant stepsize applied to logistic regression with linearly separable data, where the constant stepsize eta is so large that the loss initially oscillates. We show that GD exits the initial oscillatory phase rapidly in O(eta) steps and subsequently achieves an O(1/(t eta)) convergence rate. Our results imply that, given a budget of T steps, GD can achieve an accelerated loss of O(1/T^2) with an aggressive stepsize of eta = Theta(T), without any use of momentum or variable stepsize schedulers. This suggests that large stepsize GD achieves accelerated optimization by entering an initially unstable regime. Based on the new insights drawn from the linear model, I will further discuss the provable benefits of large stepsizes for GD in training non-linear neural networks.
Bin Yu
University of California, BerkeleyEfficient fine-tuning of large deep-learning models via infinite-width theory and experiments
In this talk, we will describe our recent works that use both infinite-width theory and extensive experiments to obtain practical lessons regarding three aspects of low-rank adaptation (LoRA) for efficient fine-tuning of large deep-learning models:
1. Learning rate parametrization: We show that the standard LoRA with the same learning rate for A and B is suboptimal in the sense that it leads to inefficient fine-tuning (in large models), and we propose a simple modification: set a much larger learning rate for matrix B to achieve more efficient feature learning.
2. Learning rate transfer: We show that decreasing the model size of a large pre-trained model in a principled way still preserves the optimal hyperparameters for fine-tuning, extending previous work from Yang et al. (2022) for model pre-training. In particular, we introduce a novel non-uniform downsampling procedure for decreasing model size by combining results from infinite-width neural network theory with classical statistical sampling theory.
3. Impact of initialization: We briefly discuss how initialization influences LoRA fine-tuning dynamics and show that one initialization scheme (A set to random and B to 0) generally leads to better performance compared to the other initialization scheme (B set to zero and B random).
This talk is based on joint work with Nikhil Ghosh and Soufiane Hayou at University of California, Berkeley.
-
Participation & Funding
Participation in the meeting falls into the following four categories. An individual’s participation category is communicated via their letter of invitation.
Group A – Organizers and Speakers
- Economy Class: For flights that are three hours or less to your destination, the maximum allowable class of service is Economy class.
- Premium Economy Class: For flights where the total air travel time (excluding connection time) is more than three hours and less than seven hours per segment to your destination, the maximum allowable class of service is premium economy.
- Business Class: When traveling internationally (or to Hawaii/Alaska) travelers are permitted to travel in Business Class on those segments that are seven hours or more. If the routing is over budget, a premium economy or mixed-class ticket will be booked.
Group B – Funded Participants
The foundation will arrange and pay for round-trip air or train travel to the conference as well as hotel accommodations and reimbursement of local expenses. Economy-class airfare will be booked for all flights.Group C – Unfunded Participants
Individuals in Group C will not receive financial support, but are encouraged to enjoy all conference-hosted meals.Group D – Remote Participants
Individuals in Group D will participate in the meeting remotely. -
Travel & Hotel
Air & Rail
For funded individuals, the foundation will arrange and pay for round-trip travel from their home city to the conference.All travel and hotel arrangements must be booked through the Simons Foundation’s preferred travel agency.
Travel specifications, including preferred airline, will be accommodated provided that these specifications are reasonable and within budget.
Travel arrangements not booked through the preferred agency, including triangle trips and routing/preferred airlines outside budget, must be pre-approved by the Simons Foundation and a reimbursement quote must be obtained through the foundation’s travel agency.
All costs related to changes made to ticketed travel are to be paid for by the participant and are not reimbursable. Please contact the foundation’s travel agency for further assistance.
Personal & Rental Cars
Personal car and rental trips over 250 miles each way require prior approval from the Simons Foundation via email.Rental cars must be pre-approved by the Simons Foundation.
The James NoMad Hotel offers valet parking. Please note there are no in-and-out privileges when using the hotel’s garage, therefore it is encouraged that participants walk or take public transportation to the Simons Foundation.
Hotel
Funded individuals who require hotel accommodations are hosted by the foundation for a maximum of three nights at The James NoMad Hotel, arriving one day before the meeting and departing one day after the meeting.Any additional nights are at the attendee’s own expense. To arrange accommodations, please register at the link included in your invitation.
The James NoMad Hotel
22 E 29th St
New York, NY 10016
(between 28th and 29th Streets)
https://www.jameshotels.com/new-york-nomad/ -
Reimbursement
Overview:
Funded individuals will be reimbursed for meals and local expenses including ground transportation. Expenses should be submitted through the foundation’s online expense reimbursement platform after the meeting’s conclusion.Expenses accrued as a result of meetings not directly related to the Simons Foundation-hosted meeting (a satellite collaboration meeting held at another institution, for example) will not be reimbursed by the Simons Foundation and should be paid by other sources.
Below are key reimbursement takeaways; a full policy will be provided with the final logistics email circulated approximately 2 weeks prior to the meeting’s start.
Meals:
The daily meal limit is $125; itemized receipts are required for expenses over $24 USD. The foundation DOES NOT provide a meal per diem and only reimburses actual meal expenses up the following amounts.- Breakfast $20
- Lunch $30
- Dinner $75
Allowable Meal Expenses
- Meals taken on travel days (when you traveled by air or train).
- Meals not provided on a meeting day, dinner for example.
- Group dinners consisting of fellow meeting participants paid by a single person will be reimbursed up to $75 per person and the amount will count towards each individual’s $125 daily meal limit.
Unallowable Meal Expenses
- Meals taken outside those provided by the foundation (breakfast, lunch, breaks and/or dinner).
- Meals taken on days not associated with Simons Foundation-coordinated events.
- Minibar expenses.
- Ubers, Lyfts, taxis, etc., taken to and from restaurants in Manhattan.
- Accommodations will be made for those with mobility restrictions.
Ground Transportation:
Expenses for ground transportation will be reimbursed for travel days (i.e. traveling to/from the airport or train station) as well as subway and bus fares while in Manhattan are reimbursable.Transportation to/from satellite meetings are not reimbursable.
-
Attendance & Building Protocols
Attendance
In-person participants and speakers are expected to attend all meeting days. Participants receiving hotel and travel support wishing to arrive on meeting days which conclude at 2:00 PM will be asked to attend remotely.COVID-19 Vaccination
Individuals accessing Simons Foundation and Flatiron Institute buildings must be fully vaccinated against COVID-19.Entry & Building Access
Upon arrival, guests will be required to show their photo ID to enter the Simons Foundation and Flatiron Institute buildings. After checking-in at the meeting reception desk, guests will be able to show their meeting name badge to re-enter the building. If you forget your name badge, you will need to provide your photo ID.The Simons Foundation and Flatiron Institute buildings are not considered “open campuses” and meeting participants will only have access to the spaces in which the meeting will take place. All other areas are off limits without prior approval.
If you require a private space to conduct a phone call or remote meeting, please contact your meeting manager at least 48-hours ahead of time so that they may book a space for you within the foundation’s room reservation system.
Guests & Children
Meeting participants are required to give 24 hour advance notice of any guests meeting them at the Simons Foundation either before or after the meeting. Outside guests are discouraged from joining meeting activities, including meals.With the exception of Simons Foundation and Flatiron Institute staff, ad hoc meeting participants who did not receive a meeting invitation directly from the Simons Foundation are not permitted.
Children under the age of 18 are not permitted to attend meetings at the Simons Foundation. Furthermore, the Simons Foundation does not provide childcare facilities or support of any kind. Special accommodations will be made for nursing parents.
-
Contacts
Meeting & Policy Questions
Christina Darras
Event Manager
[email protected]Travel & Hotel Support
FCM Travel Meetings & Events
[email protected]
Hours: M-F, 8:30 AM-5:00 PM ET
+1-888-789-6639