Distributed Sign Momentum with Local Steps for Training Transformers

Open in new window