Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks

Open in new window