Finite-Dimensional Gaussian Approximation for Deep Neural Networks: Universality in Random Weights

Balasubramanian, Krishnakumar, Ross, Nathan

arXiv.org Machine Learning 

Typically, each layer also includes bias parameters; however, setting them to zero does not affect our results and is therefore omitted--see Remark 1.5. Our main result establishes Gaussian approximation bounds, in the Wasserstein-1 distance, between the finite-dimensional distributions (FDDs) of wide neural networks and their Gaussian process limits, under general weight distributions satisfying mild moment conditions and assuming a Lipschitz activation function. Neal (1996) showed that the output distribution of a single hidden-layer neural network converges to a Gaussian in the infinite-width limit.