Optimal Gradient Sliding and its Application to Optimal Distributed Optimization Under Similarity

Neural Information Processing Systems 

We study structured convex optimization problems, with additive objective r: p q, where r is ( \mu -strongly) convex, q is L_q -smooth and convex, and p is L_p -smooth, possibly nonconvex. For such a class of problems, we proposed an inexact accelerated gradient sliding method that can skip the gradient computation for one of these components while still achieving optimal complexity of gradient calls of p and q, that is, \mathcal{O}(\sqrt{L_p/\mu}) and \mathcal{O}(\sqrt{L_q/\mu}), respectively. This result is much sharper than the classic black-box complexity \mathcal{O}(\sqrt{(L_p L_q)/\mu}), especially when the difference between L_p and L_q is large. We then apply the proposed method to solve distributed optimization problems over master-worker architectures, under agents' function similarity, due to statistical data similarity or otherwise. The distributed algorithm achieves for the first time lower complexity bounds on both communication and local gradient calls, with the former having being a long-standing open problem.