Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs

Open in new window