Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs

Open in new window