Bayesian inference with finitely wide neural networks

May-25-2023–arXiv.org Artificial Intelligence

Neal in his seminal work [1] pointed out that a shallow but infinitely wide random neural network is a Gaussian process (GP) [2] in statistical sense. Subsequent work [3, 4] in interpreting neural network with specific nonlinear activation units as kernel machines was also inspired by such idea. More recent reports [5, 6] further claimed the equivalence between GP and deep neural networks when each hidden layer in latter is of infinite width. Consequently, machine learning practitioners can perform Bayesian inference by treating deep and wide neural network as a GP, and exploit the analytic marginal and conditional properties of multivariate Gaussian distribution. Otherwise, one needs to employ gradient-based learning and bootstrap sampling for obtaining predictive distribution [7]. In reality, all neural networks have finite width. Therefore, the deviation from Gaussianity requires further quantitative account as practitioners may wonder the corrections to the predictive mean and variance in, for example, a regression task. Yaida [8] and colleagues [9] proposed a perturbative approach for computing the multivariate cumulants by direct application of Wick's contraction theorem.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Artificial Intelligence

May-25-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New Jersey > Essex County
    - Newark (0.04)
  - Massachusetts > Middlesex County
    - Cambridge (0.04)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (0.70)
  - Machine Learning > Neural Networks
    - Deep Learning (0.49)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found