Bayesian inference with finitely wide neural networks
–arXiv.org Artificial Intelligence
Neal in his seminal work [1] pointed out that a shallow but infinitely wide random neural network is a Gaussian process (GP) [2] in statistical sense. Subsequent work [3, 4] in interpreting neural network with specific nonlinear activation units as kernel machines was also inspired by such idea. More recent reports [5, 6] further claimed the equivalence between GP and deep neural networks when each hidden layer in latter is of infinite width. Consequently, machine learning practitioners can perform Bayesian inference by treating deep and wide neural network as a GP, and exploit the analytic marginal and conditional properties of multivariate Gaussian distribution. Otherwise, one needs to employ gradient-based learning and bootstrap sampling for obtaining predictive distribution [7]. In reality, all neural networks have finite width. Therefore, the deviation from Gaussianity requires further quantitative account as practitioners may wonder the corrections to the predictive mean and variance in, for example, a regression task. Yaida [8] and colleagues [9] proposed a perturbative approach for computing the multivariate cumulants by direct application of Wick's contraction theorem.
arXiv.org Artificial Intelligence
May-25-2023
- Country:
- North America > United States
- New Jersey > Essex County
- Newark (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- New Jersey > Essex County
- North America > United States
- Genre:
- Research Report (0.40)