MoNTA: Accelerating Mixture-of-Experts Training with Network-Traffc-Aware Parallel Optimization

Open in new window