A PAC-Bayesian Analysis of Randomized Learning with Application to Stochastic Gradient Descent

Neural Information Processing Systems 

We study the generalization error of randomized learning algorithms -- focusing on stochastic gradient descent (SGD) -- using a novel combination of PAC-Bayes and algorithmic stability. Importantly, our generalization bounds hold for all posterior distributions on an algorithm's random hyperparameters, including distributions that depend on the training data.