Unchosen Experts Can Contribute Too: Unleashing MoE Models ' Power by Self-Contrast

Neural Information Processing Systems 

Mixture-of-Experts (MoE) has emerged as a prominent architecture for scaling model size while maintaining computational efficiency.