Normalized Gradients for All

Orabona, Francesco

arXiv.org Artificial Intelligence 

The reduction is very generic, so I will apply it to OGD, Dual Averaging, and parameter-free algorithms. First, I will consider a close relative of normalized gradients: AdaGrad-norm stepsizes. Then, I will show similar results for the normalized gradients. The core ideas are directly derived from Levy [2017]. Indeed, the main aim of this note is to show how some very recent optimization results on normalized gradients are in fact well-known in the online learning community. The hope of the author is to instill a major awareness and academic respect for online learning results in the optimization community.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found