Simulating Posterior Bayesian Neural Networks with Dependent Weights
Apollonio, Nicola, Franzina, Giovanni, Torrisi, Giovanni Luca
The theoretical study of Bayesian neural networks was initiated by Neal [29] who proved that if a shallow Bayesian neural network is initialized with independent Gaussian parameters (i.e., biases and weights), then the output of the network converges in distribution to a Gaussian process, as the number of neurons grows large ( i.e., in the wide width limit). This result was extended to Bayesian deep neural networks two decades later (see [16, 22, 26]) and only recently it has been made quantitative by the use of the optimal transport theory (see [6] and [33]), by the Stein method for Gaussian approximation (see [3, 4, 8, 13]), and by alternative techniques ([7, 11]). Another promising approach to analyze Bayesian neural networks is through the lens of large deviations. First results in this direction are given in [23]. These findings have been successively generalized in [2, 34]. A different perspective is provided by the so-called mean field analysis of networks (see [27, 15]). The advantage of the Bayesian framework is that it allows to include in the model both prior knowledge and observed data through a prior distribution on network's parameters and a likelihood function, respectively. The emergence of Gaussian processes helped to understand how large neural networks work, how to make them more efficient, and motivated the use of Bayesian regression inference methods, see [22]. However, as noticed by [28] and [21], the connection with Gaussian processes also highlighted the limitations of wide width neural networks with independent and Gaussian distributed weights.
Jul-31-2025
- Country:
- North America
- United States
- New York (0.04)
- California > San Diego County
- San Diego (0.04)
- Canada > Ontario
- Toronto (0.14)
- United States
- Europe
- Italy (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- North America
- Genre:
- Research Report (0.70)