Closing the Gap: Tighter Analysis of Alternating Stochastic Gradient Methods for Bilevel Problems

Neural Information Processing Systems 

Stochastic nested optimization, including stochastic bilevel, min-max, and compositional optimization, is gaining popularity in many machine learning applications. While the three problems share a nested structure, existing works often treat them separately, thus developing problem-specific algorithms and analyses. Among various exciting developments, simple SGD-type updates (potentially on multiple variables) are still prevalent in solving this class of nested problems, but they are believed to have a slower convergence rate than non-nested problems.