Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo

Open in new window