FlexDeMo: Decoupled Momentum Optimization for Fully and Hybrid Sharded Training

Open in new window