Gradient tracking and variance reduction for decentralized optimization and machine learning

Xin, Ran, Kar, Soummya, Khan, Usman A.

arXiv.org Machine Learning 

Decentralized methods to solve finite-sum minimization problems are important in many signal processing and machine learning tasks where the data is distributed over a network of nodes and raw data sharing is not permitted due to privacy and/or resource constraints. In this article, we review decentralized stochastic first-order methods and provide a unified algorithmic framework that combines variance-reduction with gradient tracking to achieve both robust performance and fast convergence. We provide explicit theoretical guarantees of the corresponding methods when the objective functions are smooth and strongly-convex, and show their applicability to non-convex problems via numerical experiments. Throughout the article, we provide intuitive illustrations of the main technical ideas by casting appropriate tradeoffs and comparisons among the methods of interest and by highlighting applications to decentralized training of machine learning models. I. INTRODUCTION In multi-agent networks and large-scale machine learning, when data is available at different devices with limited communication, it is often desirable to seek scalable learning methods that do not require bringing, storing, and processing data at one single location. In this article, we describe decentralized, stochastic first-order methods, which are particularly favorable to such ad-hoc and resource-constrained settings. Specifically, we describe a unified algorithmic framework for combining different variance reduction methods with gradient tracking in order to significantly improve upon the performance of the standard decentralized stochastic gradient descent (DSGD). However, this improvement comes at a price of losing the simplicity of DSGD and we study the added communication, computation, and storage requirements with the help of precise technical statements. For the ease of accessibility, we restrict the theoretical arguments to smooth and strongly-convex objectives, while the applicability to non-convex problems is shown with the help of numerical experiments.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found