381
Publications
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models
Frederik Kunstner, Robin Yadav, Alan Milligan, Mark Schmidt, A. Bietti
posteriordb: Testing, Benchmarking and Developing Bayesian Inference Algorithms
Måns Magnusson, Jakob Torgander, Paul-Christian Bürkner, Lu Zhang, B. Carpenter, Aki Vehtari
Good Rates From Bad Coordinates: The Exponential Average Time-dependent Rate Approach
Nicodemo Mazzaferro, Subarna Sasmal, P. Cossio, Glen M. Hocky
Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations
Kazusato Oko, Yujin Song, Taiji Suzuki, D. Wu
High-order and adaptive optical conductivity calculations using Wannier interpolation
Lorenzo Van Muñoz, J. Kaye, A. Barnett, Sophie Beck
Variational Inference for Uncertainty Quantification: an Analysis of Trade-offs
C. Margossian, L. Pillaud-Vivien, L. Saul
How Truncating Weights Improves Reasoning in Language Models
Lei Chen, Joan Bruna, A. Bietti
- Previous Page
- Viewing
- Next Page