Black Box Lie Group Preconditioners for SGD
–arXiv.org Artificial Intelligence
A matrix free and a low rank approximation preconditioner are proposed to accelerate the convergence of stochastic gradient descent (SGD) by exploiting curvature information sampled from Hessian-vector products or finite differences of parameters and gradients similar to the BFGS algorithm. Both preconditioners are fitted with an online updating manner minimizing a criterion that is free of line search and robust to stochastic gradient noise, and further constrained to be on certain connected Lie groups to preserve their corresponding symmetry or invariance, e.g., orientation of coordinates by the connected general linear group with positive determinants. The Lie group's equivariance property facilitates preconditioner fitting, and its invariance property saves any need of damping, which is common in second-order optimizers, but difficult to tune. The learning rate for parameter updating and step size for preconditioner fitting are naturally normalized, and their default values work well in most situations.
arXiv.org Artificial Intelligence
Nov-8-2022
- Country:
- North America > United States
- New York > Tompkins County
- Ithaca (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- California > Santa Clara County
- Sunnyvale (0.04)
- New York > Tompkins County
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- Genre:
- Research Report (0.65)
- Industry:
- Transportation > Air (0.41)
- Technology: