Decentralized Stochastic Nonconvex Optimization under the Relaxed Smoothness
Luo, Luo, Cui, Xue, Jia, Tingkai, Chen, Cheng
–arXiv.org Artificial Intelligence
This paper studies decentralized optimization problem $f(\mathbf{x})=\frac{1}{m}\sum_{i=1}^m f_i(\mathbf{x})$, where each local function has the form of $f_i(\mathbf{x}) = {\mathbb E}\left[F(\mathbf{x};{\boldsymbol ξ}_i)\right]$ which is $(L_0,L_1)$-smooth but possibly nonconvex and the random variable ${\boldsymbol ξ}_i$ follows distribution ${\mathcal D}_i$. We propose a novel algorithm called decentralized normalized stochastic gradient descent (DNSGD), which can achieve an $ε$-stationary point at each local agent. We present a new framework for analyzing decentralized first-order methods in the relaxed smooth setting, based on the Lyapunov function related to the product of the gradient norm and the consensus error. We show the upper bounds on the sample complexity of ${\mathcal O}(m^{-1}(L_fσ^2Δ_fε^{-4} + σ^2ε^{-2} + L_f^{-2}L_1^3σ^2Δ_fε^{-1} + L_f^{-2}L_1^2σ^2))$ per agent and the communication complexity of $\tilde{\mathcal O}((L_fε^{-2} + L_1ε^{-1})γ^{-1/2}Δ_f)$, where $L_f=L_0 +L_1ζ$, $σ^2$ is the variance of the stochastic gradient, $Δ_f$ is the initial optimal function value gap, $γ$ is the spectral gap of the network, and $ζ$ is the degree of the gradient dissimilarity. In the special case of $L_1=0$, the above results (nearly) match the lower bounds of decentralized stochastic nonconvex optimization under the standard smoothness. We also conduct numerical experiments to show the empirical superiority of our method.
arXiv.org Artificial Intelligence
Sep-29-2025