Uncertainty Quantification From Scaling Laws in Deep Neural Networks

Elsharkawy, Ibrahim, Kahn, Yonatan, Hooberman, Benjamin

arXiv.org Artificial Intelligence 

Deep learning techniques have improved performance beyond conventional methods in a wide variety of tasks. However, for neural networks in particular, it is not straightforward to assign network-induced uncertainty on their output as a function of network architecture, training algorithm, and initialization [1]. One approach to uncertainty quantification (UQ) is to treat any individual network as a draw from an ensemble, and identify the systematic uncertainty with the variance in the neural network outputs over the ensemble [2, 3]. This variance can certainly be measured empirically by training a large ensemble of networks, but it would be advantageous to be able to predict it from first principles. This is possible in the infinite-width limit of multi-layer perceptron (MLP) architectures, where the statistics of the network outputs after training are Gaussian with mean and variance determined by the neural tangent kernel (NTK) [4-6]. For realistic MLPs with large but finite width n, one can compute corrections to this Gaussian distribution that are perturbative in 1/n [7].