A Coefficient Makes SVRG Effective

Yin, Yida, Xu, Zhiqiu, Li, Zhiyuan, Darrell, Trevor, Liu, Zhuang

Nov-9-2023–arXiv.org Machine Learning

Stochastic Variance Reduced Gradient (SVRG), introduced by Johnson & Zhang (2013), is a theoretically compelling optimization method. However, as Defazio & Bottou (2019) highlights, its effectiveness in deep learning is yet to be proven. In this work, we demonstrate the potential of SVRG in optimizing real-world neural networks. Our analysis finds that, for deeper networks, the strength of the variance reduction term in SVRG should be smaller and decrease as training progresses. Inspired by this, we introduce a multiplicative coefficient $\alpha$ to control the strength and adjust it through a linear decay schedule. We name our method $\alpha$-SVRG. Our results show $\alpha$-SVRG better optimizes neural networks, consistently reducing training loss compared to both baseline and the standard SVRG across various architectures and image classification datasets. We hope our findings encourage further exploration into variance reduction techniques in deep learning. Code is available at https://github.com/davidyyd/alpha-SVRG.

artificial intelligence, coefficient, machine learning, (16 more...)

arXiv.org Machine Learning

Nov-9-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.87)
  - Vision (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found