Bayesian Backpropagation Over I-O Functions Rather Than Weights
–Neural Information Processing Systems
The conventional Bayesian justification of backprop is that it finds the MAP weight vector. As this paper shows, to find the MAP io function instead one must add a correction tenn to backprop. That tenn biases one towards io functions with small description lengths, and in particular favors (some kinds of) feature-selection, pruning, and weight-sharing.
Neural Information Processing Systems
Dec-31-1994