Reviews: Dichotomize and Generalize: PAC-Bayesian Binary Activated Deep Neural Networks

Neural Information Processing Systems 

This work studies PAC-Bayes bound optimization in the setting of deep neural networks with binary activations. One of the stated contributions of the paper---showing how to optimize despite the binary activations providing no naive derivative---is, in fact, a known technique in the literature on variational inference. This somewhat undermines the impact of the work, though importing these ideas into the PAC-Bayes community is nice. The other contribution is obtaining nonvacuous bounds and here it is impressive to see such tight bounds. I have a few issues to raise with the introduction, which I would like addressed in revisions: First, the authors write: "Although informative, these results upper bound the prediction error of a (stochastic) neural network with perturbed weights, which is not the one used to predict in practice".