Distributed Sign Momentum with Local Steps for Training Transformers