MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts

Neural Information Processing Systems 

We theoretically prove and numerically demonstrate that MomentumSMoE is more stable and robust than SMoE.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found