A Provably Convergent Plug-and-Play Framework for Stochastic Bilevel Optimization
Chu, Tianshu, Xu, Dachuan, Yao, Wei, Yu, Chengming, Zhang, Jin
–arXiv.org Artificial Intelligence
A Provably Convergent Plug-and-Play Framework for Stochastic Bilevel Optimization Tianshu Chu Dachuan Xu Wei Yao Chengming Yu Jin Zhang Abstract Bilevel optimization has recently attracted significant attention in machine learning due to its wide range of applications and advanced hierarchical optimization capabilities. In this paper, we propose a plug-and-play framework, named PnPBO, for developing and analyzing stochastic bilevel optimization methods. This framework integrates both modern unbiased and biased stochastic estimators into the single-loop bilevel optimization framework introduced in [9], with several improvements. In the implementation of PnPBO, all stochastic estimators for different variables can be independently incorporated, and an additional moving average technique is applied when using an unbiased estimator for the upper-level variable. In the theoretical analysis, we provide a unified convergence and complexity analysis for PnPBO, demonstrating that the adaptation of various stochastic estimators (including PAGE, ZeroSARAH, and mixed strategies) within the PnPBO framework achieves optimal sample complexity, comparable to that of single-level optimization. This resolves the open question of whether the optimal complexity bounds for solving bilevel optimization are identical to those for single-level optimization. Keyword bilevel optimization, stochastic optimization, plug-and-play, sample complexity 1 Introduction Bilevel optimization (BLO) effectively addresses challenges arising from hierarchical optimization, where the decision variables in the upper level are also involved in the lower level. In recent years, BLO has gained increasing attention due to its extensive and effective applications, including hyperparameter optimization [13], meta-learning [20], continual learning [17], and reinforcement learning [40]. In this paper, we focus on the nonconvex-strongly-convex BLO problem under classical assumptions, formulated as follows min x R d x H (x):= f (x, y ( x)) s.t. This work was partially presented at International Conference on Machine Learning (ICML) 2024 ([7]). School of Science, Beijing University of Posts and Telecommunications; National Center for Applied Mathematics Shenzhen; yucm@bupt.edu.cn The bilevel structure typically makes the objective H (x) nonconvex, except in a few special cases. Therefore, our goal is to develop efficient gradient-based algorithms to find an ϵ-stationary point of H (x), specifically, a point x that satisfies the inequality E H (x) 2 ϵ in the stochastic setting [14]. To achieve this, a promising approach is to adapt stochastic methods from single-level optimization, which we refer to as stochastic estimators, to the bilevel optimization context.
arXiv.org Artificial Intelligence
May-5-2025