MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts

Open in new window