Rebuilding Factorized Information Criterion: Asymptotically Accurate Marginal Likelihood
Hayashi, Kohei, Maeda, Shin-ichi, Fujimaki, Ryohei
The marginal log-likelihood is a key concept of Bayesian model identification of latent variable models (LVMs), such as mixture models (MMs), probabilistic principal component analysis, and hidden Markov models (HMMs). Determination of dimensionality of latent variables is an essential task to uncover hidden structures behind the observed data as well as to mitigate overfitting. In general, LVMs are singular (i.e., mapping between parameters and probabilistic models is not one-to-one) and such classical information criteria based on the regularity assumption as the Bayesian information criterion (BIC) [Schwarz, 1978] are no longer justified. Since exact evaluation of 1 the marginal log-likelihood is often not available, approximation techniques have been developed using sampling (i.e., Markov Chain Monte Carlo methods (MCMCs) [Hastings, 1970]), a variational lower bound (i.e., the variational Bayes methods (VB) [Attias, 1999, Jordan et al., 1999]), or algebraic geometry (i.e., the widely applicable BIC (WBIC) [Watanabe, 2013]). However, model selection using these methods typically requires heavy computational cost (e.g., a large number of MCMC sampling in a high-dimensional space, an outer loop for VB/WBIC.) In the last few years, a new approximation technique and an inference method, factorized information criterion (FIC) and factorized asymptotic Bayesian inference (FAB), have been developed for some binary LVMs [Fujimaki and Morinaga, 2012, Fujimaki and Hayashi, 2012, Hayashi and Fujimaki, 2013, Eto et al., 2014]. Unlike existing methods which evaluate approximated marginal log-likelihoods calculated for each latent variable dimensionality (and therefore need an outer loop for model selection), FAB finds an effective dimensionality via an EMstyle alternating optimization procedure.
Apr-22-2015