MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production

Open in new window