On the Dirichlet Prior and Bayesian Regularization

Steck, Harald, Jaakkola, Tommi S.

Neural Information Processing Systems 

In the Bayesian approach, regularization is achieved by specifying a prior distribution over the parameters and subsequently averaging over the posterior distribution. This regularization provides not only smoother estimates of the parameters compared to maximum likelihood but also guides the selection of model structures. It was pointed out in [6] that a very large scale parameter of the Dirichlet prior can degrade predictive accuracy due to severe regularization of the parameter estimates. We complement this discussion here and show that a very small scale parameter can lead to poor over-regularized structures when a product of (conjugate) Dirichlet priors is used over multinomial conditional distributions (Section 3). Section 4 demonstrates the effect of the scale parameter and how it can be calibrated. We focus on the class of Bayesian network models throughout this paper.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found