Statistical Adaptive Stochastic Gradient Methods
Zhang, Pengchuan, Lang, Hunter, Liu, Qiang, Xiao, Lin
We propose a statistical adaptive procedure called SALSA for automatically scheduling the learning rate (step size) in stochastic gradient methods. SALSA first uses a smoothed stochastic line-search procedure to gradually increase the learning rate, then automatically switches to a statistical method to decrease the learning rate. The line search procedure ``warms up'' the optimization process, reducing the need for expensive trial and error in setting an initial learning rate. The method for decreasing the learning rate is based on a new statistical test for detecting stationarity when using a constant step size. Unlike in prior work, our test applies to a broad class of stochastic gradient algorithms without modification. The combined method is highly robust and autonomous, and it matches the performance of the best hand-tuned learning rate schedules in our experiments on several deep learning tasks.
Feb-24-2020
- Country:
- North America
- United States
- Washington > King County
- Redmond (0.04)
- Texas > Travis County
- Austin (0.14)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- Washington > King County
- Canada > British Columbia
- United States
- Europe
- Russia (0.04)
- Austria (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.14)
- Oxfordshire > Oxford (0.04)
- Hungary > Budapest
- Budapest (0.04)
- Asia
- North America
- Genre:
- Research Report > New Finding (0.87)
- Industry:
- Education (0.46)