On Masked Pre-training and the Marginal Likelihood

Open in new window