Goto

Collaborating Authors

 metagrad



d1588e685562af341ff2448de4b674d1-Paper.pdf

Neural Information Processing Systems

However,existing algorithms lack universality in the sense that they can only handle one type of convex functions and need apriori knowledge of parameters.


Dual Adaptivity: A Universal Algorithm for Minimizing the Adaptive Regret of Convex Functions

Neural Information Processing Systems

To deal with changing environments, a new performance measure--adaptive regret, defined as the maximum static regret over any interval, was proposed in online learning. Under the setting of online convex optimization, several algorithms have been successfully developed to minimize the adaptive regret. However, existing algorithms lack universality in the sense that they can only handle one type of convex functions and need apriori knowledge of parameters. By contrast, there exist universal algorithms, such as MetaGrad, that attain optimal static regret for multiple types of convex functions simultaneously. Along this line of research, this paper presents the first universal algorithm for minimizing the adaptive regret of convex functions. Specifically, we borrow the idea of maintaining multiple learning rates in MetaGrad to handle the uncertainty of functions, and utilize the technique of sleeping experts to capture changing environments.


MetaGrad: Multiple Learning Rates in Online Learning

Neural Information Processing Systems

In online convex optimization it is well known that certain subclasses of objective functions are much easier than arbitrary convex functions. We are interested in designing adaptive methods that can automatically get fast rates in as many such subclasses as possible, without any manual tuning. Previous adaptive methods are able to interpolate between strongly convex and general convex functions. We present a new method, MetaGrad, that adapts to a much broader class of functions, including exp-concave and strongly convex functions, but also various types of stochastic and non-stochastic functions without any curvature. For instance, MetaGrad can achieve logarithmic regret on the unregularized hinge loss, even though it has no curvature, if the data come from a favourable probability distribution. MetaGrad's main feature is that it simultaneously considers multiple learning rates. Unlike all previous methods with provable regret guarantees, however, its learning rates are not monotonically decreasing over time and are not tuned based on a theoretically derived bound on the regret. Instead, they are weighted directly proportional to their empirical performance on the data using a tilted exponential weights master algorithm.


MetaGrad: Multiple Learning Rates in Online Learning

Tim van Erven, Wouter M. Koolen

Neural Information Processing Systems

MetaGrad's main feature is that it simultaneously considers multiple learning rates. Unlike previous methods with provable regret guarantees, however, its learning rates are not monotonically decreasing over time and are not tuned based on a theoretically derived bound on the regret.






Reviews: MetaGrad: Multiple Learning Rates in Online Learning

Neural Information Processing Systems

Online convex minimization is a fundamental problem in learning theory with various applications. Here, we are given a sequence of convex functions f1,...,fT in an online fashion and we seek to predict the minimizer of the sum so as to minimize the regret. This is a very well studied question with various guarantees known even when only given gradient information about the functions fi. For general functions this leads to a regret of sqrt(T) and for some special cases, we know better guarantees: O(log T) for strongly convex functions, O(d log T) for exp-convex functions etc. However, the algorithms that achieve the latter better guarantees are quite different and in some cases need to know bounds on the specific parameters of the functions (e.g., strong-convexity of the fi's, although this can now be avoided).