DES-LOC: Desynced Low Communication Adaptive Optimizers for Training Foundation Models

Open in new window