Training Trajectories of Language Models Across Scales

Open in new window