On Masked Pre-training and the Marginal Likelihood