Supplementary Materials A Expanded Related Work

Neural Information Processing Systems 

A number of gradient-based bilevel algorithms have been proposed via AIDand ITD-based hypergradient approximations. Our study here develops unified convergence analysis for all N and Q regimes. A variety of stochastic bilevel optimization algorithms have been proposed recently. For all loop-sizes, we set the hyperparameters to achieve the best complexity as long as the convergence is guaranteed. We note that a similar conclusion for AID-BiO (e.g., All experiments are run over a single NVIDIA Tesla P100 GPU.