AITopics | saddle point

Reinforcement learning (RL) often has a hierarchical structure, where an upper-level (UL) learner selects model parameters and a lower-level (LL) decision-making process responds, naturally leading to a bilevel optimization problem. Most existing bilevel RL methods assume a single-policy LL Markov decision process (MDP), and therefore fail to capture competitive structures arising in applications such as incentive design, where multiple policies interact. We study bilevel optimization problems in which the LL problem is a regularized min-max zero-sum Markov game and the UL objective is optimized through the saddle-point equilibrium induced by the LL game. In this work, we propose penalty-augmented Nikaido-Isoda descent-ascent (PANDA), a penalty-based first-order policy-gradient method based on the Nikaido-Isoda function. By exploiting the min-max game structure, PANDA avoids computing UL hypergradients and does not require second-order information. We prove that PANDA converges to stationary points without convexity assumptions on either the UL or LL objectives. Moreover, PANDA reaches an $ε$-stationary point in $\tilde{\mathcal{O}}(ε^{-1})$ iterations with sample complexity $\tilde{\mathcal{O}}(ε^{-3})$, matching the best-known rates for bilevel RL with single-policy LL MDPs. Experiments demonstrate the superior performance of PANDA over closely related baselines.

artificial intelligence, machine learning, optimization problem, (14 more...)

arXiv.org Machine Learning

2605.26654

Country: Asia (0.27)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

0bed45bd5774ffddc95ffe500024f628-Paper.pdf

Neural Information Processing SystemsMay-1-2026, 01:41:30 GMT

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.68)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Finding Local Minima Efficiently in Decentralized Optimization

Neural Information Processing SystemsApr-29-2026, 17:19:50 GMT

In this paper we study the second-order optimality of decentralized stochastic algorithm that escapes saddle point efficiently for nonconvex optimization problems. We propose a new pure gradient-based decentralized stochastic algorithm PEDESTAL with a novel convergence analysis framework to address the technical challenges unique to the decentralized stochastic setting. Our method is the first decentralized stochastic algorithm to achieve second-order optimality with non-asymptotic analysis. We provide theoretical guarantees with the gradient complexity of O(ϵ 3)to find O(ϵ, ϵ)-second-order stationary point, which matches state-of-the-art results of centralized counterparts or decentralized methods to find first-order stationary point. We also conduct two decentralized tasks in our experiments, a matrix sensing task with synthetic data and a matrix factorization task with a real-world dataset to validate the performance of our method.

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

Add feedback

b7500454af92cf3934eb1cc2d59abbdf-Paper-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 13:09:28 GMT

artificial intelligence, machine learning, optimization, (14 more...)

Neural Information Processing Systems

Country: Asia > China (0.29)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.94)

Add feedback

54eea69746513c0b90bbe6227b6f46c3-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 23:23:38 GMT

artificial intelligence, iteration, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Escaping Saddle Points with Compressed SGD

Neural Information Processing SystemsApr-25-2026, 23:23:34 GMT

Stochastic gradient descent (SGD) is a prevalent optimization technique for largescale distributed machine learning. While SGD computation can be efficiently divided between multiple machines, communication typically becomes a bottleneck in the distributed setting. Gradient compression methods can be used to alleviate this problem, and a recent line of work shows that SGD augmented with gradient compression converges to an ε-first-order stationary point. In this paper we extend these results to convergence to an ε-second-order stationary point (ε-SOSP), which is to the best of our knowledge the first result of this type. In addition, we show that, when the stochastic gradient is not Lipschitz, compressed SGD with RANDOMK compressor converges to an ε-SOSP with the same number of iterations as uncompressed SGD [25], while improving the total communication by a factor of Θ( dε 3/4), where dis the dimension of the optimization problem. We present additional results for the cases when the compressor is arbitrary and when the stochastic gradient is Lipschitz.

artificial intelligence, machine learning, saddle point, (15 more...)

Neural Information Processing Systems

Country: North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.99)

Add feedback

54e8912427a8d007ece906c577fdca60-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 23:22:39 GMT

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

simple-saddle-camera-version

张辰逸

Neural Information Processing SystemsApr-25-2026, 17:14:13 GMT

Escaping saddle points is a central research topic in nonconvex optimization. In this paper, we propose a simple gradient-based algorithm such that for a smooth function f: Rn!R, it outputs an -approximate second-order stationary point in O(logn/ 1.75)iterations. Compared to the previous state-of-the-art algorithms by Jin et al. with O(log4 n/ 2) or O(log6 n/ 1.75) iterations, our algorithm is polynomially better in terms of logn and matches their complexities in terms of 1/ .

algorithm, artificial intelligence, machine learning, (13 more...)

Neural Information Processing Systems

Country: Asia (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.57)

Add feedback