From Acceleration to Saturation: Scaling Behavior of Bootstrapped Language Model Pretraining

Open in new window