FlowMoE: A Scalable Pipeline Scheduling Framework for Distributed Mixture-of-Experts Training
–Neural Information Processing Systems
The parameter size of modern large language models (LLMs) can be scaled up to the trillion-level via the sparsely-activated Mixture-of-Experts (MoE) technique to avoid excessive increase of the computational costs. To further improve training efficiency, pipelining computation and communication has become a promising solution for distributed MoE training. However, existing work primarily focuses on scheduling tasks within the MoE layer, such as expert computing and all-to-all (A2A) communication, while neglecting other key operations including multi-head attention (MHA) computing, gating, and all-reduce communication. In this paper, we propose FlowMoE, a scalable framework for scheduling multi-type task pipelines.
Neural Information Processing Systems
Jun-12-2026, 02:56:53 GMT
- Technology: