Reviews: PAC-Bayes Un-Expected Bernstein Inequality

Neural Information Processing Systems 

The paper introduces three exciting ideas to the area of PAC-Bayesian analysis: (1) a new way of using "half-samples" to construct informed priors; (2) offsetting (biasing) the loss estimate by the loss of a reference hypothesis h_* to achieve "fast convergence rates" under Bernstein condition [even when the loss itself is bounded away from zero]; (3) a new form of Empirical Bernstein inequality, which is combined with PAC-Bayes to exploit low variance [the need in a new inequality and its advantages are not well explained]. The authors compare a bound based on combination of the three ideas with PAC-Bayes bound of Maurer (2004) and some other PAC-Bayes bounds, demonstrating superiority of the new approach. While the work is really exciting, the authors fail to clearly separate between the three major contributions. It is not shown how much each of the three novelties contribute to the success of the method. Biasing and informed priors can be easily combined with the bound of Tolstikhin & Seldin (2013) [TS] and this comparison should be added.]