Curriculum-Guided Layer Scaling for Language Model Pretraining

Open in new window