$ϕ$-Balancing for Mixture-of-Experts Training

Open in new window