hypergradient
- Europe > Switzerland > Zürich > Zürich (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (2 more...)
- Research Report > Experimental Study (0.92)
- Research Report > New Finding (0.67)
- Government (1.00)
- Law (0.67)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > India (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
- Information Technology > Data Science (0.67)
Natural Hypergradient Descent: Algorithm Design, Convergence Analysis, and Parallel Implementation
Kong, Deyi, Chen, Zaiwei, Zhang, Shuzhong, Mou, Shancong
In this work, we propose Natural Hypergradient Descent (NHGD), a new method for solving bilevel optimization problems. To address the computational bottleneck in hypergradient estimation--namely, the need to compute or approximate Hessian inverse--we exploit the statistical structure of the inner optimization problem and use the empirical Fisher information matrix as an asymptotically consistent surrogate for the Hessian. This design enables a parallel optimize-and-approximate framework in which the Hessian-inverse approximation is updated synchronously with the stochastic inner optimization, reusing gradient information at negligible additional cost. Our main theoretical contribution establishes high-probability error bounds and sample complexity guarantees for NHGD that match those of state-of-the-art optimize-then-approximate methods, while significantly reducing computational time overhead. Empirical evaluations on representative bilevel learning tasks further demonstrate the practical advantages of NHGD, highlighting its scalability and effectiveness in large-scale machine learning settings.
- North America > United States > Minnesota (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
LearningtoMutatewithHypergradientGuided Population
Toaddress theabovechallenges, wepropose anovelhyperparameter mutation (HPM) scheduling algorithm in this study, which adopts a population based training framework to explicitly learn a trade-off (i.e., a mutation schedule) between using the hypergradient-guided local search and the mutation-driven global search.
Appendices for: Gradient-based Hyperparameter Optimization Over Long Horizons Paul Micaelli University of Edinburgh {paul.micaelli}@ed.ac.uk Amos Storkey University of Edinburgh {a.storkey }@ed.ac.uk
Now we return to the second part of (9). This illustrates how tight the upper bound is. We use a GeForce RTX 2080 Ti GPU for all experiments. Instead, we always carve out a validation set from our training set. Figure 1 The batch size is set to 128, and 1000 fixed images are used for the validation data. Here we provide the raw hypergradients corresponding to the outer optimization shown in Appendices: Figure 1.
- Oceania > Australia > New South Wales > Sydney (0.14)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)
Non-Stationary Functional Bilevel Optimization
Bohne, Jason, Petrulionyte, Ieva, Arbel, Michael, Mairal, Julien, Polak, Paweł
Functional bilevel optimization (FBO) provides a powerful framework for hierarchical learning in function spaces, yet current methods are limited to static offline settings and perform suboptimally in online, non-stationary scenarios. We propose SmoothFBO, the first algorithm for non-stationary FBO with both theoretical guarantees and practical scalability. SmoothFBO introduces a time-smoothed stochastic hypergradient estimator that reduces variance through a window parameter, enabling stable outer-loop updates with sublinear regret. Importantly, the classical parametric bilevel case is a special reduction of our framework, making SmoothFBO a natural extension to online, non-stationary settings. Empirically, SmoothFBO consistently outperforms existing FBO methods in non-stationary hyperparameter optimization and model-based reinforcement learning, demonstrating its practical effectiveness. Together, these results establish SmoothFBO as a general, theoretically grounded, and practically viable foundation for bilevel optimization in online, non-stationary scenarios.
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
- North America > United States > New York > Suffolk County > Stony Brook (0.04)