Rethinking Memory and Communication Costs for Efficient Data Parallel Training of Large Language Models

Neural Information Processing Systems 

Recently, various strategies for distributed training of large language models (LLMs) have been proposed.