Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization
–Neural Information Processing Systems
The Mixture of Experts (MoE) paradigm provides a powerful way to decompose dense layers into smaller, modular computations often more amenable to human interpretation, debugging, and editability. However, a major challenge lies in the computational cost of scaling the number of experts high enough to achieve fine-grained specialization. In this paper, we propose the Multilinear Mixture of Experts (μMoE) layer to address this, focusing on vision models.
Neural Information Processing Systems
Dec-26-2025, 04:35:54 GMT
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (0.36)
- Vision (0.40)
- Information Technology > Artificial Intelligence