Merging Experts into One: Improving Computational Efficiency of Mixture of Experts

Open in new window