Goto

Collaborating Authors

 saddle point


Bilevel Optimization over Saddle Points of Zero-Sum Markov Games

arXiv.org Machine Learning

Reinforcement learning (RL) often has a hierarchical structure, where an upper-level (UL) learner selects model parameters and a lower-level (LL) decision-making process responds, naturally leading to a bilevel optimization problem. Most existing bilevel RL methods assume a single-policy LL Markov decision process (MDP), and therefore fail to capture competitive structures arising in applications such as incentive design, where multiple policies interact. We study bilevel optimization problems in which the LL problem is a regularized min-max zero-sum Markov game and the UL objective is optimized through the saddle-point equilibrium induced by the LL game. In this work, we propose penalty-augmented Nikaido-Isoda descent-ascent (PANDA), a penalty-based first-order policy-gradient method based on the Nikaido-Isoda function. By exploiting the min-max game structure, PANDA avoids computing UL hypergradients and does not require second-order information. We prove that PANDA converges to stationary points without convexity assumptions on either the UL or LL objectives. Moreover, PANDA reaches an $ฮต$-stationary point in $\tilde{\mathcal{O}}(ฮต^{-1})$ iterations with sample complexity $\tilde{\mathcal{O}}(ฮต^{-3})$, matching the best-known rates for bilevel RL with single-policy LL MDPs. Experiments demonstrate the superior performance of PANDA over closely related baselines.



Finding Local Minima Efficiently in Decentralized Optimization

Neural Information Processing Systems

In this paper we study the second-order optimality of decentralized stochastic algorithm that escapes saddle point efficiently for nonconvex optimization problems. We propose a new pure gradient-based decentralized stochastic algorithm PEDESTAL with a novel convergence analysis framework to address the technical challenges unique to the decentralized stochastic setting. Our method is the first decentralized stochastic algorithm to achieve second-order optimality with non-asymptotic analysis. We provide theoretical guarantees with the gradient complexity of O(ฯต 3)to find O(ฯต, ฯต)-second-order stationary point, which matches state-of-the-art results of centralized counterparts or decentralized methods to find first-order stationary point. We also conduct two decentralized tasks in our experiments, a matrix sensing task with synthetic data and a matrix factorization task with a real-world dataset to validate the performance of our method.




Escaping Saddle Points with Compressed SGD

Neural Information Processing Systems

Stochastic gradient descent (SGD) is a prevalent optimization technique for largescale distributed machine learning. While SGD computation can be efficiently divided between multiple machines, communication typically becomes a bottleneck in the distributed setting. Gradient compression methods can be used to alleviate this problem, and a recent line of work shows that SGD augmented with gradient compression converges to an ฮต-first-order stationary point. In this paper we extend these results to convergence to an ฮต-second-order stationary point (ฮต-SOSP), which is to the best of our knowledge the first result of this type. In addition, we show that, when the stochastic gradient is not Lipschitz, compressed SGD with RANDOMK compressor converges to an ฮต-SOSP with the same number of iterations as uncompressed SGD [25], while improving the total communication by a factor of ฮ˜( dฮต 3/4), where dis the dimension of the optimization problem. We present additional results for the cases when the compressor is arbitrary and when the stochastic gradient is Lipschitz.



simple-saddle-camera-version

Neural Information Processing Systems

Escaping saddle points is a central research topic in nonconvex optimization. In this paper, we propose a simple gradient-based algorithm such that for a smooth function f: Rn!R, it outputs an -approximate second-order stationary point in O(logn/ 1.75)iterations. Compared to the previous state-of-the-art algorithms by Jin et al. with O(log4 n/ 2) or O(log6 n/ 1.75) iterations, our algorithm is polynomially better in terms of logn and matches their complexities in terms of 1/ .