Solving stochastic weak Minty variational inequalities without increasing batch size
Pethick, Thomas, Fercoq, Olivier, Latafat, Puya, Patrinos, Panagiotis, Cevher, Volkan
–arXiv.org Artificial Intelligence
This paper introduces a family of stochastic extragradient-type algorithms for a class of nonconvex-nonconcave problems characterized by the weak Minty vari-ational inequality (MVI). Unlike existing results on extragradient methods in the monotone setting, employing diminishing stepsizes is no longer possible in the weak MVI setting. This has led to approaches such as increasing batch sizes per iteration which can however be prohibitively expensive. In contrast, our proposed methods involves two stepsizes and only requires one additional oracle evaluation per iteration. We show that it is possible to keep one fixed stepsize while it is only the second stepsize that is taken to be diminishing, making it interesting even in the monotone setting. Almost sure convergence is established and we provide a unified analysis for this family of schemes which contains a nonlinear generalization of the celebrated primal dual hybrid gradient algorithm. Stochastic first-order methods have been at the core of the current success in deep learning applications. These methods are mostly well-understood for minimization problems at this point. This is even the case in the nonconvex setting where there exists matching upper and lower bounds on the complexity for finding an approximately stable point (Arjevani et al., 2019). The picture becomes less clear when moving beyond minimization into nonconvex-nonconcave min-imax problems--or more generally nonmonotone variational inequalities. Even in the deterministic case, finding a stationary point is in general intractable (Daskalakis et al., 2021; Hirsch & V avasis, 1987). This is in stark contrast with minimization where only global optimality is NP-hard. An interesting nonmonotone class for which we do have e fficient algorithms is characterized by the so called weak Minty variational inequality (MVI) (Diakonikolas et al., 2021). This problem class captures nontrivial structures such as attracting limit cycles and is governed by a parameter ρ whose negativity increases the degree of nonmonotonicity. In other words, it seems that we need to take γ large to guarantee convergence for a large class. This reliance on a large stepsize is at the core of why the community has struggled to provide a stochastic variants for weak MVIs. The only known results e ff ectively increase the batch size at every iteration (Diakonikolas et al., 2021, Thm. Pethick et al. (2022) proposed (SEG +) which attempts to tackle the noise by only diminishing the second stepsize.
arXiv.org Artificial Intelligence
Feb-17-2023