Supplementary Material for: Fast rates for prediction with limited expert advice Inria 1 Notation

Neural Information Processing Systems 

We denote R(.) the expected risk function: R(.) = E[l(., Y)], and define R Similarly, the excess loss of any predictor remains unchanged when replacing l by l. Therefore, without loss of generality we can assume that the loss function always takes values in [0, B], which we do for the remainder of the paper. The following lemma is technical, it will be used in the proof of the instance dependent bound (Theorem M-5.3). Lemma 2. Let x 1, c (0, 1) and y > 0 such that: log(x/c) > y. (1) x Then: Inequality (1) implies x < log(x/c), y and further log(x/c) < log(1/yc) + log log(x/c) log(1/yc) + 1 2 log(x/c), since it can be easily checked that log(t) t/2 for all t > 0. Solving and plugging back into the previous display leads to the claim. In this section, we present concentration inequalities for the key quantities used in our analysis.