Review for NeurIPS paper: Bayesian Deep Learning and a Probabilistic Perspective of Generalization

Neural Information Processing Systems 

Summary and Contributions: This paper provides a mix between discussing high-level conceptual ideas and perspectives and presenting a variety of experimental results, all under the umbrella of generalization in (Bayesian) deep learning. More concretely, the central argument of the paper is that Bayesian learning should be primarily viewed as aiming to marginalize over different plausible hypotheses of the data, intead of relying on a single hypothesis (which is what ordinary deep learning is doing). The ultimate goal is thus to accurately estimate the posterior _predictive_ distribution (over outputs), rather than to accurately approximate the posterior distribution (over weights). They thus recommend that Bayesian methods should ideally focus their efforts on carefully representing the posterior distribution in regions that contribute most to the predictive distribution. In this line of thought, they further argue that deep ensembles, one of the state-of-the-art approaches for obtaining well-calibrated predictive distributions, do effectively approximate the Bayesian model average (even if the individual ensemble members are not actually samples from the posterior), and thus should not be considered in competition to Bayesian methods.