Towards a Comprehensive Scaling Law of Mixture-of-Experts

Open in new window