AITopics | stochastic bilevel optimization

Collaborating Authors

stochastic bilevel optimization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Achieving O (ϵ 1. 5) Complexity in Hessian/Jacobian-free Stochastic Bilevel Optimization

Neural Information Processing SystemsFeb-15-2026, 10:40:52 GMT

This class of bilevel problems has been studied extensively from the theoretical perspective in recent years.

artificial intelligence, machine learning, optimization, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > Erie County > Buffalo (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum

Neural Information Processing SystemsDec-25-2025, 08:27:24 GMT

We focus on bilevel problems where the lower level subproblem is strongly-convex and the upper level objective function is smooth. Unlike prior works which rely on \emph{two-timescale} or \emph{double loop} techniques, we design a stochastic momentum-assisted gradient estimator for both the upper and lower level updates. The latter allows us to control the error in the stochastic gradient updates due to inaccurate solution to both subproblems. If the upper objective function is smooth but possibly non-convex, we show that {SUSTAIN}~requires $O(\epsilon^{-3/2})$ iterations (each using $O(1)$ samples) to find an $\epsilon$-stationary solution. The $\epsilon$-stationary solution is defined as the point whose squared norm of the gradient of the outer function is less than or equal to $\epsilon$. The total number of stochastic gradient samples required for the upper and lower level objective functions matches the best-known complexity for single-level stochastic gradient algorithms. We also analyze the case when the upper level objective function is strongly-convex.

near-optimal algorithm, stochastic bilevel optimization, underline, (10 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.85)

Add feedback

SBO-RNN: Reformulating Recurrent Neural Networks via Stochastic Bilevel Optimization

Neural Information Processing SystemsDec-25-2025, 00:18:11 GMT

In this paper we consider the training stability of recurrent neural networks (RNNs) and propose a family of RNNs, namely SBO-RNN, that can be formulated using stochastic bilevel optimization (SBO). With the help of stochastic gradient descent (SGD), we manage to convert the SBO problem into an RNN where the feedforward and backpropagation solve the lower and upper-level optimization for learning hidden states and their hyperparameters, respectively. We prove that under mild conditions there is no vanishing or exploding gradient in training SBO-RNN. Empirically we demonstrate our approach with superior performance on several benchmark datasets, with fewer parameters, less training data, and much faster convergence. Code is available at https://zhang-vislab.github.io.

reformulating recurrent neural network, sbo-rnn, stochastic bilevel optimization, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.62)

Add feedback

Bridging Constraints and Stochasticity: A Fully First-Order Method for Stochastic Bilevel Optimization with Linear Constraints

Phan, Cac, Wang, Kai

arXiv.org Machine LearningNov-18-2025

This work provides the first finite-time convergence guarantees for linearly constrained stochastic bilevel optimization using only first-order methods, requiring solely gradient information without any Hessian computations or second-order derivatives. We address the unprecedented challenge of simultaneously handling linear constraints, stochastic noise, and finite-time analysis in bilevel optimization, a combination that has remained theoretically intractable until now. While existing approaches either require second-order information, handle only unconstrained stochastic problems, or provide merely asymptotic convergence results, our method achieves finite-time guarantees using gradient-based techniques alone. We develop a novel framework that constructs hypergradient approximations via smoothed penalty functions, using approximate primal and dual solutions to overcome the fundamental challenges posed by the interaction between linear constraints and stochastic noise. Our theoretical analysis provides explicit finite-time bounds on the bias and variance of the hypergradient estimator, demonstrating how approximation errors interact with stochastic perturbations. We prove that our first-order algorithm converges to $(δ, ε)$-Goldstein stationary points using $Θ(δ^{-1}ε^{-5})$ stochastic gradient evaluations, establishing the first finite-time complexity result for this challenging problem class and representing a significant theoretical breakthrough in constrained stochastic bilevel optimization.

artificial intelligence, machine learning, optimization, (15 more...)

arXiv.org Machine Learning

2511.09845

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Convergence Rate in Nonlinear Two-Time-Scale Stochastic Approximation with State (Time)-Dependence

Chen, Zixi, Xu, Yumin, Zhang, Ruixun

arXiv.org Artificial IntelligenceSep-16-2025

The nonlinear two-time-scale stochastic approximation is widely studied under conditions of bounded variances in noise. Motivated by recent advances that allow for variability linked to the current state or time, we consider state- and time-dependent noises. We show that the Lyapunov function exhibits polynomial convergence rates in both cases, with the rate of polynomial delay depending on the parameters of state- or time-dependent noises. Notably, if the state noise parameters fully approach their limiting value, the Lyapunov function achieves an exponential convergence rate. We provide two numerical examples to illustrate our theoretical findings in the context of stochastic gradient descent with Polyak-Ruppert averaging and stochastic bilevel optimization.

artificial intelligence, machine learning, optimization problem, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1609/aaai.v39i15.33756

2509.11039

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Faster Gradient Methods for Highly-smooth Stochastic Bilevel Optimization

Chen, Lesi, Li, Junru, Zhang, Jingzhao

arXiv.org Machine LearningSep-4-2025

This paper studies the complexity of finding an $ε$-stationary point for stochastic bilevel optimization when the upper-level problem is nonconvex and the lower-level problem is strongly convex. Recent work proposed the first-order method, F${}^2$SA, achieving the $\tilde{\mathcal{O}}(ε^{-6})$ upper complexity bound for first-order smooth problems. This is slower than the optimal $Ω(ε^{-4})$ complexity lower bound in its single-level counterpart. In this work, we show that faster rates are achievable for higher-order smooth problems. We first reformulate F$^2$SA as approximating the hyper-gradient with a forward difference. Based on this observation, we propose a class of methods F${}^2$SA-$p$ that uses $p$th-order finite difference for hyper-gradient approximation and improves the upper bound to $\tilde{\mathcal{O}}(p ε^{4-p/2})$ for $p$th-order smooth problems. Finally, we demonstrate that the $Ω(ε^{-4})$ lower bound also holds for stochastic bilevel problems when the high-order smoothness holds for the lower-level variable, indicating that the upper bound of F${}^2$SA-$p$ is nearly optimal in the highly smooth region $p = Ω( \log ε^{-1} / \log \log ε^{-1})$.

artificial intelligence, machine learning, optimization, (20 more...)

arXiv.org Machine Learning

2509.02937

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

An Accelerated Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness

Neural Information Processing SystemsMay-27-2025, 08:34:01 GMT

This paper investigates a class of stochastic bilevel optimization problems where the upper-level function is nonconvex with potentially unbounded smoothness and the lower-level problem is strongly convex. These problems have significant applications in sequential data learning, such as text classification using recurrent neural networks. The unbounded smoothness is characterized by the smoothness constant of the upper-level function scaling linearly with the gradient norm, lacking a uniform upper bound. Existing state-of-the-art algorithms require \widetilde{O}(\epsilon {-4}) oracle calls of stochastic gradient or Hessian/Jacobian-vector product to find an \epsilon -stationary point. However, it remains unclear if we can further improve the convergence rate when the assumptions for the function in the population level also hold for each random realization almost surely (e.g., Lipschitzness of each realization of the stochastic gradient).

accelerated algorithm, algorithm, stochastic bilevel optimization, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.60)

Add feedback

A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum

Neural Information Processing SystemsJan-19-2025, 15:02:19 GMT

We focus on bilevel problems where the lower level subproblem is strongly-convex and the upper level objective function is smooth. Unlike prior works which rely on \emph{two-timescale} or \emph{double loop} techniques, we design a stochastic momentum-assisted gradient estimator for both the upper and lower level updates. The latter allows us to control the error in the stochastic gradient updates due to inaccurate solution to both subproblems. If the upper objective function is smooth but possibly non-convex, we show that {SUSTAIN} requires O(\epsilon {-3/2}) iterations (each using O(1) samples) to find an \epsilon -stationary solution. The \epsilon -stationary solution is defined as the point whose squared norm of the gradient of the outer function is less than or equal to \epsilon . The total number of stochastic gradient samples required for the upper and lower level objective functions matches the best-known complexity for single-level stochastic gradient algorithms.

objective function, stochastic bilevel optimization, underline, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.91)

Add feedback

Filters

Collaborating Authors

stochastic bilevel optimization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

01db36a646c07c64dd39a92b4eceb417-Paper-Conference.pdf

Achieving O (ϵ 1. 5) Complexity in Hessian/Jacobian-free Stochastic Bilevel Optimization

A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum

SBO-RNN: Reformulating Recurrent Neural Networks via Stochastic Bilevel Optimization

Bridging Constraints and Stochasticity: A Fully First-Order Method for Stochastic Bilevel Optimization with Linear Constraints

7c3a8d20ceadb7c519e9ac1bb77a15ff-Paper-Conference.pdf

Convergence Rate in Nonlinear Two-Time-Scale Stochastic Approximation with State (Time)-Dependence

Faster Gradient Methods for Highly-smooth Stochastic Bilevel Optimization

An Accelerated Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness

A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum