Expectigrad: Fast Stochastic Optimization with Robust Convergence Properties
Daley, Brett, Amato, Christopher
Many popular adaptive gradient methods such as Adam and RMSProp rely on an exponential moving average (EMA) to normalize their stepsizes. While the EMA makes these methods highly responsive to new gradient information, recent research has shown that it also causes divergence on at least one convex optimization problem. We propose a novel method called Expectigrad, which adjusts stepsizes according to a per-component unweighted mean of all historical gradients and computes a bias-corrected momentum term jointly between the numerator and denominator. We prove that Expectigrad cannot diverge on every instance of the optimization problem known to cause Adam to diverge. We also establish a regret bound in the general stochastic nonconvex setting that suggests Expectigrad is less susceptible to gradient variance than existing methods are. Testing Expectigrad on several high-dimensional machine learning tasks, we find it often performs favorably to state-of-the-art methods with little hyperparameter tuning. Efficiently training deep neural networks has proven crucial for achieving state-of-the-art results in machine learning (e.g. At the core of these successes lies the backpropagation algorithm (Rumelhart et al., 1986), which provides a general procedure for computing the gradient of a loss measure with respect to the parameters of an arbitrary network. Because exact gradient computation over an entire dataset is expensive, training is often conducted using randomly sampled minibatches of data instead. Consequently, training can be modeled as a stochastic optimization problem where the loss is minimized in expectation.
Oct-3-2020
- Country:
- Asia > Russia (0.04)
- North America > United States
- Massachusetts > Suffolk County > Boston (0.04)
- Europe
- Russia (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Genre:
- Research Report > Promising Solution (0.54)
- Industry:
- Education (0.46)
- Technology: