Daniel M. Roy
Data-dependent PAC-Bayes priors via differential privacy
Gintare Karolina Dziugaite, Daniel M. Roy
The Probably Approximately Correct (PAC) Bayes framework (McAllester, 1999) can incorporate knowledge about the learning algorithm and (data) distribution through the use of distribution-dependent priors, yielding tighter generalization bounds on data-dependent posteriors. Using this flexibility, however, is difficult, especially when the data distribution is presumed to be unknown. We show how an e-differentially private data-dependent prior yields a valid PAC-Bayes bound, and then show how non-private mechanisms for choosing priors can also yield generalization bounds. As an application of this result, we show that a Gaussian prior mean chosen via stochastic gradient Langevin dynamics (SGLD; Welling and Teh, 2011) leads to a valid PAC-Bayes bound given control of the 2-Wasserstein distance to an e-differentially private stationary distribution. We study our datadependent bounds empirically, and show that they can be nonvacuous even when other distribution-dependent bounds are vacuous.
Fast-rate PAC-Bayes Generalization Bounds via Shifted Rademacher Processes
Jun Yang, Shengyang Sun, Daniel M. Roy
The developments of Rademacher complexity and PAC-Bayesian theory have been largely independent. One exception is the PAC-Bayes theorem of Kakade, Sridharan, and Tewari [21], which is established via Rademacher complexity theory by viewing Gibbs classifiers as linear operators. The goal of this paper is to extend this bridge between Rademacher complexity and state-of-the-art PAC-Bayesian theory. We first demonstrate that one can match the fast rate of Catoni's PAC-Bayes bounds [8] using shifted Rademacher processes [27, 43, 44]. We then derive a new fast-rate PAC-Bayes bound in terms of the "flatness" of the empirical risk surface on which the posterior concentrates. Our analysis establishes a new framework for deriving fast-rate PAC-Bayes bounds and yields new insights on PAC-Bayesian theory.
Fast-rate PAC-Bayes Generalization Bounds via Shifted Rademacher Processes
Jun Yang, Shengyang Sun, Daniel M. Roy
The developments of Rademacher complexity and PAC-Bayesian theory have been largely independent. One exception is the PAC-Bayes theorem of Kakade, Sridharan, and Tewari [21], which is established via Rademacher complexity theory by viewing Gibbs classifiers as linear operators. The goal of this paper is to extend this bridge between Rademacher complexity and state-of-the-art PAC-Bayesian theory. We first demonstrate that one can match the fast rate of Catoni's PAC-Bayes bounds [8] using shifted Rademacher processes [27, 43, 44]. We then derive a new fast-rate PAC-Bayes bound in terms of the "flatness" of the empirical risk surface on which the posterior concentrates. Our analysis establishes a new framework for deriving fast-rate PAC-Bayes bounds and yields new insights on PAC-Bayesian theory.
Measuring the reliability of MCMC inference with bidirectional Monte Carlo
Roger B. Grosse, Siddharth Ancha, Daniel M. Roy
Markov chain Monte Carlo (MCMC) is one of the main workhorses of probabilistic inference, but it is notoriously hard to measure the quality of approximate posterior samples. This challenge is particularly salient in black box inference methods, which can hide details and obscure inference failures. In this work, we extend the recently introduced bidirectional Monte Carlo [GGA15] technique to evaluate MCMC-based posterior inference algorithms. By running annealed importance sampling (AIS) chains both from prior to posterior and vice versa on simulated data, we upper bound in expectation the symmetrized KL divergence between the true posterior distribution and the distribution of approximate samples. We integrate our method into two probabilistic programming languages, WebPPL [GS] and Stan [CGHL+ p], and validate it on several models and datasets. As an example of how our method be used to guide the design of inference algorithms, we apply it to study the effectiveness of different model representations in WebPPL and Stan.
Data-dependent PAC-Bayes priors via differential privacy
Gintare Karolina Dziugaite, Daniel M. Roy
The Probably Approximately Correct (PAC) Bayes framework (McAllester, 1999) can incorporate knowledge about the learning algorithm and (data) distribution through the use of distribution-dependent priors, yielding tighter generalization bounds on data-dependent posteriors. Using this flexibility, however, is difficult, especially when the data distribution is presumed to be unknown. We show how an e-differentially private data-dependent prior yields a valid PAC-Bayes bound, and then show how non-private mechanisms for choosing priors can also yield generalization bounds. As an application of this result, we show that a Gaussian prior mean chosen via stochastic gradient Langevin dynamics (SGLD; Welling and Teh, 2011) leads to a valid PAC-Bayes bound given control of the 2-Wasserstein distance to an e-differentially private stationary distribution. We study our datadependent bounds empirically, and show that they can be nonvacuous even when other distribution-dependent bounds are vacuous.