Look Ahead or Look Around? A Theoretical Comparison Between Autoregressive and Masked Pretraining

Open in new window