Supplementary Material for: Fast rates for prediction with limited expert advice Inria 1 Notation
–Neural Information Processing Systems
We denote R(.) the expected risk function: R(.) = E[l(., Y)], and define R Similarly, the excess loss of any predictor remains unchanged when replacing l by l. Therefore, without loss of generality we can assume that the loss function always takes values in [0, B], which we do for the remainder of the paper. The following lemma is technical, it will be used in the proof of the instance dependent bound (Theorem M-5.3). Lemma 2. Let x 1, c (0, 1) and y > 0 such that: log(x/c) > y. (1) x Then: Inequality (1) implies x < log(x/c), y and further log(x/c) < log(1/yc) + log log(x/c) log(1/yc) + 1 2 log(x/c), since it can be easily checked that log(t) t/2 for all t > 0. Solving and plugging back into the previous display leads to the claim. In this section, we present concentration inequalities for the key quantities used in our analysis.
Neural Information Processing Systems
Feb-10-2025, 16:43:32 GMT