Generalised Mixability, Constant Regret, and Bayesian Updating
Reid, Mark D., Frongillo, Rafael M., Williamson, Robert C.
The combination or aggregation of predictions is central to machine learning. Traditional Bayesian updating can be viewed as a particular way of aggregating information that takes account of prior information. Notions of "mixability" which play a central role in the setting of prediction with expert advice offer a more general way to aggregate, and which take account of the loss function used to evaluate predictions (how well they fit the data). As shown by Vovk [2001], his more general "aggregating algorithm" reduces to Bayesian updating when log loss is used. However, as we will show there is another design variable that to date has not been fully exploited. The aggregating algorithm makes use of a distance between the current distribution and a prior which serves as a regulariser. In particular the aggregating algorithm uses the KL-divergence. We consider the general setting of an arbitrary loss and an arbitrary regulariser (in the form of a Bregman divergence) and show that we recover the core technical 1 result of traditional mixability: if a loss is mixable in the generalised sense then there is a generalised aggregating algorithm which can be guaranteed to have constant regret.
Mar-10-2014