FlexDeMo: Decoupled Momentum Optimization for Fully and Hybrid Sharded Training