Goto

Collaborating Authors

 multiple learning rate


MetaGrad: Multiple Learning Rates in Online Learning

Neural Information Processing Systems

In online convex optimization it is well known that certain subclasses of objective functions are much easier than arbitrary convex functions. We are interested in designing adaptive methods that can automatically get fast rates in as many such subclasses as possible, without any manual tuning. Previous adaptive methods are able to interpolate between strongly convex and general convex functions. We present a new method, MetaGrad, that adapts to a much broader class of functions, including exp-concave and strongly convex functions, but also various types of stochastic and non-stochastic functions without any curvature. For instance, MetaGrad can achieve logarithmic regret on the unregularized hinge loss, even though it has no curvature, if the data come from a favourable probability distribution. MetaGrad's main feature is that it simultaneously considers multiple learning rates. Unlike all previous methods with provable regret guarantees, however, its learning rates are not monotonically decreasing over time and are not tuned based on a theoretically derived bound on the regret. Instead, they are weighted directly proportional to their empirical performance on the data using a tilted exponential weights master algorithm.


Reviews: MetaGrad: Multiple Learning Rates in Online Learning

Neural Information Processing Systems

Online convex minimization is a fundamental problem in learning theory with various applications. Here, we are given a sequence of convex functions f1,...,fT in an online fashion and we seek to predict the minimizer of the sum so as to minimize the regret. This is a very well studied question with various guarantees known even when only given gradient information about the functions fi. For general functions this leads to a regret of sqrt(T) and for some special cases, we know better guarantees: O(log T) for strongly convex functions, O(d log T) for exp-convex functions etc. However, the algorithms that achieve the latter better guarantees are quite different and in some cases need to know bounds on the specific parameters of the functions (e.g., strong-convexity of the fi's, although this can now be avoided).


CRPS Learning

Berrisch, Jonathan, Ziel, Florian

arXiv.org Machine Learning

Combination and aggregation techniques can improve forecast accuracy substantially. This also holds for probabilistic forecasting methods where full predictive distributions are combined. There are several time-varying and adaptive weighting schemes like Bayesian model averaging (BMA). However, the performance of different forecasters may vary not only over time but also in parts of the distribution. So one may be more accurate in the center of the distributions, and other ones perform better in predicting the distribution's tails. Consequently, we introduce a new weighting procedure that considers both varying performance across time and the distribution. We discuss pointwise online aggregation algorithms that optimize with respect to the continuous ranked probability score (CRPS). After analyzing the theoretical properties of a fully adaptive Bernstein online aggregation (BOA) method, we introduce smoothing procedures for pointwise CRPS learning. The properties are confirmed and discussed using simulation studies. Additionally, we illustrate the performance in a forecasting study for carbon markets. In detail, we predict the distribution of European emission allowance prices.


MetaGrad: Multiple Learning Rates in Online Learning

Erven, Tim van, Koolen, Wouter M.

Neural Information Processing Systems

In online convex optimization it is well known that certain subclasses of objective functions are much easier than arbitrary convex functions. We are interested in designing adaptive methods that can automatically get fast rates in as many such subclasses as possible, without any manual tuning. Previous adaptive methods are able to interpolate between strongly convex and general convex functions. We present a new method, MetaGrad, that adapts to a much broader class of functions, including exp-concave and strongly convex functions, but also various types of stochastic and non-stochastic functions without any curvature. For instance, MetaGrad can achieve logarithmic regret on the unregularized hinge loss, even though it has no curvature, if the data come from a favourable probability distribution.


A Second-order Bound with Excess Losses

Gaillard, Pierre, Stoltz, Gilles, Van Erven, Tim

arXiv.org Machine Learning

We study online aggregation of the predictions of experts, and first show new second-order regret bounds in the standard setting, which are obtained via a version of the Prod algorithm (and also a version of the polynomially weighted average algorithm) with multiple learning rates. These bounds are in terms of excess losses, the differences between the instantaneous losses suffered by the algorithm and the ones of a given expert. We then demonstrate the interest of these bounds in the context of experts that report their confidences as a number in the interval[0, 1] using a generic reduction to the standard setting. We conclude by two other applications in the standard setting, which improve the known bounds in case of small excess losses and show a bounded regret against i.i.d.