Normalized Gradients for All
–arXiv.org Artificial Intelligence
The reduction is very generic, so I will apply it to OGD, Dual Averaging, and parameter-free algorithms. First, I will consider a close relative of normalized gradients: AdaGrad-norm stepsizes. Then, I will show similar results for the normalized gradients. The core ideas are directly derived from Levy [2017]. Indeed, the main aim of this note is to show how some very recent optimization results on normalized gradients are in fact well-known in the online learning community. The hope of the author is to instill a major awareness and academic respect for online learning results in the optimization community.
arXiv.org Artificial Intelligence
Aug-10-2023