New Insight into Hybrid Stochastic Gradient Descent: Beyond With-Replacement Sampling and Convexity
Zhou, Pan, Yuan, Xiaotong, Feng, Jiashi
–Neural Information Processing Systems
As an incremental-gradient algorithm, the hybrid stochastic gradient descent (HSGD) enjoys merits of both stochastic and full gradient methods for finite-sum minimization problem. However, the existing rate-of-convergence analysis for HSGD is made under with-replacement sampling (WRS) and is restricted to convex problems. It is not clear whether HSGD still carries these advantages under the common practice of without-replacement sampling (WoRS) for non-convex problems. In this paper, we affirmatively answer this open question by showing that under WoRS and for both convex and non-convex problems, it is still possible for HSGD (with constant step-size) to match full gradient descent in rate of convergence, while maintaining comparable sample-size-independent incremental first-order oracle complexity to stochastic gradient descent. For a special class of finite-sum problems with linear prediction models, our convergence results can be further improved in some cases. Extensive numerical results confirm our theoretical affirmation and demonstrate the favorable efficiency of WoRS-based HSGD.
Neural Information Processing Systems
Dec-31-2018
- Country:
- Asia
- China > Jiangsu Province
- Nanjing (0.04)
- Middle East > Jordan (0.04)
- Singapore > Central Region
- Singapore (0.04)
- China > Jiangsu Province
- North America > Canada
- Asia
- Genre:
- Research Report > New Finding (0.88)
- Technology: