AITopics | hypergradient

In this work, we propose Natural Hypergradient Descent (NHGD), a new method for solving bilevel optimization problems. To address the computational bottleneck in hypergradient estimation--namely, the need to compute or approximate Hessian inverse--we exploit the statistical structure of the inner optimization problem and use the empirical Fisher information matrix as an asymptotically consistent surrogate for the Hessian. This design enables a parallel optimize-and-approximate framework in which the Hessian-inverse approximation is updated synchronously with the stochastic inner optimization, reusing gradient information at negligible additional cost. Our main theoretical contribution establishes high-probability error bounds and sample complexity guarantees for NHGD that match those of state-of-the-art optimize-then-approximate methods, while significantly reducing computational time overhead. Empirical evaluations on representative bilevel learning tasks further demonstrate the practical advantages of NHGD, highlighting its scalability and effectiveness in large-scale machine learning settings.

artificial intelligence, machine learning, optimization problem, (16 more...)

arXiv.org Machine Learning

2602.10905

Country:

North America > United States > Minnesota (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

bac49b876d5dfc9cd169c22ef5178ca7-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 22:06:29 GMT

evograd, hyperparameter, learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

LearningtoMutatewithHypergradientGuided Population

Neural Information Processing SystemsFeb-10-2026, 10:12:31 GMT

Toaddress theabovechallenges, wepropose anovelhyperparameter mutation (HPM) scheduling algorithm in this study, which adopts a population based training framework to explicitly learn a trade-off (i.e., a mutation schedule) between using the hypergradient-guided local search and the mutation-driven global search.

artificial intelligence, hyperparameter, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.90)

Add feedback

Appendices for: Gradient-based Hyperparameter Optimization Over Long Horizons Paul Micaelli University of Edinburgh {paul.micaelli}@ed.ac.uk Amos Storkey University of Edinburgh {a.storkey }@ed.ac.uk

Neural Information Processing SystemsFeb-8-2026, 19:46:18 GMT

Now we return to the second part of (9). This illustrates how tight the upper bound is. We use a GeForce RTX 2080 Ti GPU for all experiments. Instead, we always carve out a validation set from our training set. Figure 1 The batch size is set to 128, and 1000 fixed images are used for the validation data. Here we provide the raw hypergradients corresponding to the outer optimization shown in Appendices: Figure 1.

artificial intelligence, hypergradient, machine learning, (11 more...)

Neural Information Processing Systems

Country: Europe > Sweden > Stockholm > Stockholm (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback