How Tight Can PAC-Bayes be in the Small Data Regime?

Neural Information Processing Systems 

In this paper, we investigate the question: _Given a small number of datapoints, for example $N = 30$, how tight can PAC-Bayes and test set bounds be made?_ For such small datasets, test set bounds adversely affect generalisation performance by withholding data from the training procedure. In this setting, PAC-Bayes bounds are especially attractive, due to their ability to use all the data to simultaneously learn a posterior and bound its generalisation risk. We focus on the case of i.i.d.