Rethinking Memory and Communication Costs for Efficient Data Parallel Training of Large Language Models

Feb-5-2026, 16:21:13 GMT–Neural Information Processing Systems

Recently, various strategies for distributed training of large language models (LLMs) have been proposed.By categorizing them into basic strategies and composite strategies, we have discovered that existing basic strategies provide limited options in specific scenarios, leaving considerable room for optimization in training speed.In this paper, we rethink the impact of memory and communication costs on the training speed of LLMs, taking into account the impact of intra-and inter-group communication performance disparities, and then propose a new set of basic strategies named the \textbf{Pa}rtial \textbf{R}edundancy \textbf{O}ptimizer (PaRO).PaRO Data Parallelism (PaRO-DP) accelerates LLM training through refined model state partitioning and tailored training procedures.

artificial intelligence, large language model, natural language, (9 more...)

Neural Information Processing Systems

Feb-5-2026, 16:21:13 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)