Optimizing ML Training with Metagradient Descent
Engstrom, Logan, Ilyas, Andrew, Chen, Benjamin, Feldmann, Axel, Moses, William, Madry, Aleksander
A major challenge in training large-scale machine learning models is configuring the training process to maximize model performance, i.e., finding the best training setup from a vast design space. In this work, we unlock a gradient-based approach to this problem. We first introduce an algorithm for efficiently calculating metagradients -- gradients through model training -- at scale. We then introduce a "smooth model training" framework that enables effective optimization using metagradients. With metagradient descent (MGD), we greatly improve on existing dataset selection methods, outperform accuracy-degrading data poisoning attacks by an order of magnitude, and automatically find competitive learning rate schedules.
Mar-17-2025
- Country:
- Asia
- China > Beijing
- Beijing (0.04)
- Middle East > Jordan (0.04)
- China > Beijing
- Europe
- Poland (0.04)
- Spain > Andalusia
- Granada Province > Granada (0.04)
- North America > United States
- Massachusetts > Middlesex County > Cambridge (0.04)
- Asia
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Energy (0.46)
- Health & Medicine > Diagnostic Medicine (0.45)
- Information Technology (0.45)
- Transportation (0.46)
- Technology: