Convergence Properties of Stochastic Hypergradients

Grazzi, Riccardo, Pontil, Massimiliano, Salzo, Saverio

Nov-13-2020–arXiv.org Machine Learning

Bilevel optimization problems are receiving increasing attention in machine learning as they provide a natural framework for hyperparameter optimization and meta-learning. A key step to tackle these problems in the design of optimization algorithms for bilevel optimization is the efficient computation of the gradient of the upper-level objective (hypergradient). In this work, we study stochastic approximation schemes for the hypergradient, which are important when the lower-level problem is empirical risk minimization on a large dataset. We provide iteration complexity bounds for the mean square error of the hypergradient approximation, under the assumption that the lower-level problem is accessible only through a stochastic mapping which is a contraction in expectation. Preliminary numerical experiments support our theoretical analysis.

assumption, neural network, optimization problem, (18 more...)

arXiv.org Machine Learning

Nov-13-2020

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.93)
  - Representation & Reasoning > Optimization (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found