SMILE: Scaling Mixture-of-Experts with Efficient Bi-level Routing

Open in new window