Optimizing ML Training with Metagradient Descent

Engstrom, Logan, Ilyas, Andrew, Chen, Benjamin, Feldmann, Axel, Moses, William, Madry, Aleksander

Mar-17-2025–arXiv.org Machine Learning

A major challenge in training large-scale machine learning models is configuring the training process to maximize model performance, i.e., finding the best training setup from a vast design space. In this work, we unlock a gradient-based approach to this problem. We first introduce an algorithm for efficiently calculating metagradients -- gradients through model training -- at scale. We then introduce a "smooth model training" framework that enables effective optimization using metagradients. With metagradient descent (MGD), we greatly improve on existing dataset selection methods, outperform accuracy-degrading data poisoning attacks by an order of magnitude, and automatically find competitive learning rate schedules.

artificial intelligence, machine learning, metagradient, (16 more...)

arXiv.org Machine Learning

Mar-17-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Massachusetts > Middlesex County > Cambridge (0.04)
- Europe
  - Poland (0.04)
  - Spain > Andalusia
    - Granada Province > Granada (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - China > Beijing
    - Beijing (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Transportation (0.46)
- Energy (0.46)
- Information Technology (0.45)
- Health & Medicine > Diagnostic Medicine (0.45)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (0.68)
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Statistical Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found