Reviews: Optimal Stochastic and Online Learning with Individual Iterates
–Neural Information Processing Systems
This paper proposes an online stochastic optimization algorithm (similar to SGD) that has optimal convergence rate of the last iterate in two settings (O(1/sqrt(T)) for Lipschitz convex functions and O(1/T) strongly convex functions), and additionally it allows an arbitrary non-smooth regularizer (e.g. Many subsets of the properties are achieved by prior works. Namely, it was known how to achieve these results up to O(log T) factors. It was known how to achieve the optimal rates with averaging, which, however, destroys sparsity. However, this paper has the first algorithm that has all the properties simultaneously and removes the log factors. The paper has rigorous proofs of the convergence rates and extensive numerical experiments.
Neural Information Processing Systems
Jan-22-2025, 18:15:04 GMT