FlowMoE: A Scalable Pipeline Scheduling Framework for Distributed Mixture-of-Experts Training

Jun-12-2026, 02:56:53 GMT–Neural Information Processing Systems

The parameter size of modern large language models (LLMs) can be scaled up to the trillion-level via the sparsely-activated Mixture-of-Experts (MoE) technique to avoid excessive increase of the computational costs. To further improve training efficiency, pipelining computation and communication has become a promising solution for distributed MoE training. However, existing work primarily focuses on scheduling tasks within the MoE layer, such as expert computing and all-to-all (A2A) communication, while neglecting other key operations including multi-head attention (MHA) computing, gating, and all-reduce communication. In this paper, we propose FlowMoE, a scalable framework for scheduling multi-type task pipelines.

communication, large language model, natural language, (10 more...)

Neural Information Processing Systems

Jun-12-2026, 02:56:53 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.59)