MoE-Prism: Disentangling Monolithic Experts for Elastic MoE Services via Model-System Co-Designs
Xia, Xinfeng, Liu, Jiacheng, Hou, Xiaofeng, Tang, Peng, Zhang, Mingxuan, Wang, Wenfeng, Li, Chao
–arXiv.org Artificial Intelligence
Mixture-of-Experts (MoE) models, the state-of-the-art in large-scale AI, achieve high quality by sparsely activating parameters. However, their reliance on routing between a few monolithic experts via a top-k mechanism creates a "quality cliff", offering only a few coarse-grained operating points. This inflexibility forces a difficult trade-off between cost and quality, preventing adaptation to diverse Service Level Objectives (SLOs) and leading to significant resource over-provisioning. This paper introduces MoE-Prism, a model-system co-design that transforms rigid MoE models into elastic services. Our methodology is divided into two phases. First, an \emph{Offline Refactoring Engine} systematically deconstructs monolithic experts into fine-grained "sub-experts." This engine employs a partitioning optimization solver that uses a metaheuristic-based approach to group neurons, preserving functional locality without requiring retraining. Second, an \emph{Online Scheduling Engine} leverages this new elasticity through QoS-aware scheduling. It implements specialized policies to solve complex system problems, including maximizing throughput in cloud deployments and managing latency-optimized offloading for memory-constrained devices. Our evaluation across three different MoE models shows that MoE-Prismprovides over 4 times more distinct, stable operating points than the baseline. This allows an AI service to dynamically improve throughput by up to 19.9\% under a strict latency budget or reduce latency by up to 10.36\% under limited resources. MoE-Prism provides the critical "control knob" to bridge the model-system gap, enabling the next generation of adaptive, efficient, and QoS-aware AI services.
arXiv.org Artificial Intelligence
Oct-23-2025
- Country:
- Asia
- Europe > Italy
- Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > United States
- California > San Diego County
- Carlsbad (0.04)
- Florida > Miami-Dade County
- Miami (0.04)
- Massachusetts > Suffolk County
- Boston (0.04)
- California > San Diego County
- Genre:
- Research Report (0.82)
- Industry:
- Information Technology > Services (0.48)
- Technology: