Distributed Low-Communication Training with Decoupled Momentum Optimization