Robust and Parallel Bayesian Model Selection
Zhang, Michael Minyi, Lam, Henry, Lin, Lizhen
Being able to select the right model for inference is a crucial task. As our main example, we consider model selection for a normal linear model: Y Xβ, N (0,σ 2 I), (1) where Y is anN dimensional response vector,X is anN D dimensional design matrix and β is a D dimensional vector of regression parameters. Here the candidate models to be selected could refer to the sets of significant variables. In a Bayesian setting, we have a natural probabilistic evaluation of models 5 through posterior model probabilities. Depending on the objectives of the data analysis, we may be interested in assessing the belief on which is the "best" model or obtaining predictions with minimum error. Existing procedures to accomplish the aforementioned goals, however, will perform poorly under the presence of outliers and contaminations. In addition, 10 Markov chain Monte Carlo (MCMC) algorithms for these methods do not scale to big data situations. The goal of this paper is to investigate a "divide-and- conquer" method that integrates with existing Bayesian model selection techniques, in a way that is robust to outliers and, moreover, allows us to perform Bayesian model selection in parallel.
Mar-22-2018