Towards Understanding the Mixture-of-Experts Layer in Deep Learning

Jan-17-2025, 20:07:29 GMT–Neural Information Processing Systems

The Mixture-of-Experts (MoE) layer, a sparsely-activated model controlled by a router, has achieved great success in deep learning. However, the understanding of such architecture remains elusive. In this paper, we formally study how the MoE layer improves the performance of neural network learning and why the mixture model will not collapse into a single model. Our empirical results suggest that the cluster structure of the underlying problem and the non-linearity of the expert are pivotal to the success of MoE. This motivates us to consider a challenging classification problem with intrinsic cluster structures.

deep learning, mixture-of-expert layer, moe layer, (2 more...)

Neural Information Processing Systems

Jan-17-2025, 20:07:29 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.83)