A PAC-Bayes oracle inequality for sparse neural networks

Steffen, Maximilian F., Trabs, Mathias

arXiv.org Machine Learning 

Driven by the enormous success of neural networks in a broad spectrum of machine learning applications, see Goodfellow et al. [16] and Schmidhuber [29] for an introduction, the theoretical understanding of network based methods is a dynamic and flourishing research area at the intersection of mathematical statistics, optimization and approximation theory. In addition to theoretical guarantees, uncertainty quantification is an important and challenging problem for neural networks and has motivated the introduction of Bayesian neural networks, where for each network weight a distribution is learned, see Graves [17] and Blundell et al. [8] and numerous subsequent articles. In this work we study the Gibbs posterior distribution for a stochastic neural network. In a nonparametric regression problem, we show that the corresponding estimator achieves a minimax-optimal prediction risk bound up to a logarithmic factor. Moreover, the method is adaptive with respect to the unknown regularity and structure of the regression function. While early theoretical foundations for neural nets are summarized by Anthony & Bartlett [4], the excellent approximation properties of deep neural nets, especially with the ReLU activation function, have been discovered in the last years, see e.g.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found