Posterior Concentration for Sparse Deep Learning

Neural Information Processing Systems 

We introduce Spike-and-Slab Deep Learning (SS-DL), a fully Bayesian alternative to dropout for improving generalizability of deep ReLU networks. This new type of regularization enables provable recovery of smooth input-output maps with {\sl unknown} levels of smoothness. Indeed, we show that the posterior distribution concentrates at the near minimax rate for alpha-Holder smooth maps, performing as well as if we knew the smoothness level alpha ahead of time. These network attributes typically depend on unknown smoothness in order to be optimal. We obviate this constraint with the fully Bayes construction.