AITopics | lower-level problem

Penalty-Based First-Order Methods for Bilevel Optimization with Minimax and Constrained Lower-Level Problems

Shen, Yiyang, He, Yutian, Wang, Weiran, Lin, Qihang

arXiv.org Machine LearningMay-11-2026

We study a class of bilevel optimization problems in which both the upper- and lower-level problems have minimax structures. This setting captures a broad range of emerging applications. Despite the extensive literature on bilevel optimization and minimax optimization separately, existing methods mainly focus on bilevel optimization with lower-level minimization problems, often under strong convexity assumptions, and are not directly applicable to the minimax lower-level setting considered here. To address this gap, we develop penalty-based first-order methods for bilevel minimax optimization without requiring strong convexity of the lower-level problem. In the deterministic setting, we establish that the proposed method finds an $ε$-KKT point with $\tilde{O}(ε^{-4})$ oracle complexity. We further show that bilevel problems with convex constrained lower-level minimization can be reformulated as special cases of our framework via Lagrangian duality, leading to an $\tilde{O}(ε^{-4})$ complexity bound that improves upon the existing $\tilde{O}(ε^{-7})$ result. Finally, we extend our approach to the stochastic setting, where only stochastic gradient oracles are available, and prove that the proposed stochastic method finds a nearly $ε$-KKT point with $\tilde{O}(ε^{-9})$ oracle complexity.

artificial intelligence, machine learning, optimization problem, (16 more...)

arXiv.org Machine Learning

2605.08006

Country: North America > United States > Iowa (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

f77d9409647c096789067c09455858a2-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 08:26:48 GMT

artificial intelligence, estimator, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Add feedback

3de568f8597b94bda53149c7d7f5958c-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 13:46:09 GMT

artificial intelligence, machine learning, optimization problem, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.75)

Add feedback

Projection-Free Methods for Stochastic Simple Bilevel Optimization with Convex Lower-level Problem

Neural Information Processing SystemsApr-25-2026, 03:10:13 GMT

In this paper, we study a class of stochastic bilevel optimization problems, also known as stochastic simple bilevel optimization, where we minimize a smooth stochastic objective function over the optimal solution set of another stochastic convex optimization problem. We introduce novel stochastic bilevel optimization methods that locally approximate the solution set of the lower-level problem via a stochastic cutting plane, and then run a conditional gradient update with variance reduction techniques to control the error induced by using stochastic gradients. For the case that the upper-level function is convex, our method requires O(max{1/ϵ2f,1/ϵ2g}) stochastic oracle queries to obtain a solution that is ϵfoptimal for the upper-level and ϵg-optimal for the lower-level. This guarantee improves the previous best-known complexity of O(max{1/ϵ4f,1/ϵ4g}). Moreover, for the case that the upper-level function is non-convex, our method requires at most O(max{1/ϵ3f,1/ϵ3g})stochastic oracle queries to find an (ϵf,ϵg)-stationary point. In the finite-sum setting, we show that the number of stochastic oracle calls required by our method are O( n/ϵ) and O( n/ϵ2) for the convex and non-convex settings, respectively, where ϵ = min{ϵf,ϵg}.

artificial intelligence, machine learning, probability 1, (16 more...)

Neural Information Processing Systems

Technology: