Goto

Collaborating Authors

 smooth minimax optimization


Efficient Algorithms for Smooth Minimax Optimization

Neural Information Processing Systems

This paper studies first order methods for solving smooth minimax optimization problems $\min_x \max_y g(x,y)$ where $g(\cdot,\cdot)$ is smooth and $g(x,\cdot)$ is concave for each $x$. In terms of $g(\cdot,y)$, we consider two settings -- strongly convex and nonconvex -- and improve upon the best known rates in both. For strongly-convex $g(\cdot, y),\ \forall y$, we propose a new direct optimal algorithm combining Mirror-Prox and Nesterov's AGD, and show that it can find global optimum in $\widetilde{O}\left(1/k^2 \right)$ iterations, improving over current state-of-the-art rate of $O(1/k)$. We use this result along with an inexact proximal point method to provide $\widetilde{O}\left(1/k^{1/3} \right)$ rate for finding stationary points in the nonconvex setting where $g(\cdot, y)$ can be nonconvex. This improves over current best-known rate of $O(1/k^{1/5})$.



Efficient Algorithms for Smooth Minimax Optimization

Neural Information Processing Systems

This paper studies first order methods for solving smooth minimax optimization problems \min_x \max_y g(x,y) where g(\cdot,\cdot) is smooth and g(x,\cdot) is concave for each x . In terms of g(\cdot,y), we consider two settings -- strongly convex and nonconvex -- and improve upon the best known rates in both. For strongly-convex g(\cdot, y),\ \forall y, we propose a new direct optimal algorithm combining Mirror-Prox and Nesterov's AGD, and show that it can find global optimum in \widetilde{O}\left(1/k 2 \right) iterations, improving over current state-of-the-art rate of O(1/k) . We use this result along with an inexact proximal point method to provide \widetilde{O}\left(1/k {1/3} \right) rate for finding stationary points in the nonconvex setting where g(\cdot, y) can be nonconvex. This improves over current best-known rate of O(1/k {1/5}) .


Reviews: Efficient Algorithms for Smooth Minimax Optimization

Neural Information Processing Systems

This paper aims at solving the saddle point problem \min_{x} \max_{y} g(x, y) where g(x, \cdot) is concave for each x and g(\cdot, y) is either strongly convex or nonconvex for every y . The authors introduce a new algorithm called DIAG which combines Mirror-prox and Nesterov's AGD to solve the minimax problem. For the case that g is strongly-convex with respect to x, the authors show that DIAG has a convergence rate of \mathcal{O}(1/k 2) when the suboptimality for a pair of point (\hat{x},\hat{y}) is defined as the primal-dual optimality gap \max_{y} g(\hat{x},y) - \min_{x} g(x, \hat{y}) . For the case that g is nonconvex with respect to x, the authors show that DIAG finds an \epsilon -first-order stationary point (FOSP) after at most \mathcal{O}(1/\epsilon 3) gradient evaluations. The convergence criteria considered in this paper for solving a strongly convex-concave problem is interesting.


Reviews: Efficient Algorithms for Smooth Minimax Optimization

Neural Information Processing Systems

All reviewers agreed that this paper makes an interesting contribution to NeurIPS. Please make sure to take the reviewers' comments in consideration for the camera-ready version, in particular improving the clarity of the presentation and making the overall statements more precise.


Efficient Algorithms for Smooth Minimax Optimization

Neural Information Processing Systems

This paper studies first order methods for solving smooth minimax optimization problems \min_x \max_y g(x,y) where g(\cdot,\cdot) is smooth and g(x,\cdot) is concave for each x . In terms of g(\cdot,y), we consider two settings -- strongly convex and nonconvex -- and improve upon the best known rates in both. For strongly-convex g(\cdot, y),\ \forall y, we propose a new direct optimal algorithm combining Mirror-Prox and Nesterov's AGD, and show that it can find global optimum in \widetilde{O}\left(1/k 2 \right) iterations, improving over current state-of-the-art rate of O(1/k) . We use this result along with an inexact proximal point method to provide \widetilde{O}\left(1/k {1/3} \right) rate for finding stationary points in the nonconvex setting where g(\cdot, y) can be nonconvex. This improves over current best-known rate of O(1/k {1/5}) .


Lifted Primal-Dual Method for Bilinearly Coupled Smooth Minimax Optimization

Thekumparampil, Kiran Koshy, He, Niao, Oh, Sewoong

arXiv.org Machine Learning

We study the bilinearly coupled minimax problem: $\min_{x} \max_{y} f(x) + y^\top A x - h(y)$, where $f$ and $h$ are both strongly convex smooth functions and admit first-order gradient oracles. Surprisingly, no known first-order algorithms have hitherto achieved the lower complexity bound of $\Omega((\sqrt{\frac{L_x}{\mu_x}} + \frac{\|A\|}{\sqrt{\mu_x \mu_y}} + \sqrt{\frac{L_y}{\mu_y}}) \log(\frac1{\varepsilon}))$ for solving this problem up to an $\varepsilon$ primal-dual gap in the general parameter regime, where $L_x, L_y,\mu_x,\mu_y$ are the corresponding smoothness and strongly convexity constants. We close this gap by devising the first optimal algorithm, the Lifted Primal-Dual (LPD) method. Our method lifts the objective into an extended form that allows both the smooth terms and the bilinear term to be handled optimally and seamlessly with the same primal-dual framework. Besides optimality, our method yields a desirably simple single-loop algorithm that uses only one gradient oracle call per iteration. Moreover, when $f$ is just convex, the same algorithm applied to a smoothed objective achieves the nearly optimal iteration complexity. We also provide a direct single-loop algorithm, using the LPD method, that achieves the iteration complexity of $O(\sqrt{\frac{L_x}{\varepsilon}} + \frac{\|A\|}{\sqrt{\mu_y \varepsilon}} + \sqrt{\frac{L_y}{\varepsilon}})$. Numerical experiments on quadratic minimax problems and policy evaluation problems further demonstrate the fast convergence of our algorithm in practice.


Efficient Algorithms for Smooth Minimax Optimization

Thekumparampil, Kiran K., Jain, Prateek, Netrapalli, Praneeth, Oh, Sewoong

Neural Information Processing Systems

This paper studies first order methods for solving smooth minimax optimization problems $\min_x \max_y g(x,y)$ where $g(\cdot,\cdot)$ is smooth and $g(x,\cdot)$ is concave for each $x$. In terms of $g(\cdot,y)$, we consider two settings -- strongly convex and nonconvex -- and improve upon the best known rates in both. For strongly-convex $g(\cdot, y),\ \forall y$, we propose a new direct optimal algorithm combining Mirror-Prox and Nesterov's AGD, and show that it can find global optimum in $\widetilde{O}\left(1/k 2 \right)$ iterations, improving over current state-of-the-art rate of $O(1/k)$. We use this result along with an inexact proximal point method to provide $\widetilde{O}\left(1/k {1/3} \right)$ rate for finding stationary points in the nonconvex setting where $g(\cdot, y)$ can be nonconvex. This improves over current best-known rate of $O(1/k {1/5})$.