Bayesian Learning via Stochastic Dynamics

Neural Information Processing Systems 

The attempt to find a single "optimal" weight vector in conven(cid:173) tional network training can lead to overfitting and poor generaliza(cid:173) tion. Bayesian methods avoid this, without the need for a valida(cid:173) tion set, by averaging the outputs of many networks with weights sampled from the posterior distribution given the training data. This sample can be obtained by simulating a stochastic dynamical system that has the posterior as its stationary distribution. I view neural networks as probabilistic models, and learning as statistical inference. Conventional network learning finds a single "optimal" set of network parameter values, corresponding to maximum likelihood or maximum penalized likelihood in(cid:173) ference.