Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond

Yun, Chulhee, Rajput, Shashank, Sra, Suvrit

Oct-19-2021–arXiv.org Machine Learning

In distributed learning, local SGD (also known as federated averaging) and its simple baseline minibatch SGD are widely studied optimization methods. Most existing ana lyses of these methods assume independent and unbiased gradient estimate s obtained via with-replacement sampling. In contrast, we study shuffling-based variants: minibatch and local Random Reshuffling, which draw stochastic gradients without replacement and a re thus closer to practice. For smooth functions satisfying the Polyak-Łojasiewicz condi tion, we obtain convergence bounds (in the large epoch regime) which show that these shuffling-b ased variants converge faster than their with-replacement counterparts. Moreover, we prove m atching lower bounds showing that our convergence analysis is tight . Finally, we propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.

artificial intelligence, machine learning, null, (17 more...)

arXiv.org Machine Learning

Oct-19-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Massachusetts (0.14)
  - Wisconsin (0.14)

Genre:
- Research Report (0.49)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Statistical Learning
    - Gradient Descent (0.48)
  - Representation & Reasoning (1.00)