Distributed Lion for Communication Efficient Distributed Training

May-28-2025, 17:57:00 GMT–Neural Information Processing Systems

The Lion optimizer has been a promising competitor with the AdamW for training large AI models, with advantages in memory, computation, and sample efficiency. In this paper, we introduce Distributed Lion, an innovative adaptation of Lion for distributed training environments. Leveraging the sign operator in Lion, our Distributed Lion only requires to communicate binary or lower-precision vectors between workers to the center server, significantly reducing the communication cost. Our theoretical analysis confirms Distributed Lion's convergence properties. Empirical results demonstrate its robustness across a range of tasks, worker counts, and batch sizes, on both vision and language problems. Notably, Distributed Lion attains comparable performance to standard Lion or AdamW optimizers applied on aggregated gradients, but with significantly reduced communication bandwidth. This feature is particularly advantageous for training large models. In addition, we also demonstrate that Distributed Lion presents a more favorable performancebandwidth balance compared to existing efficient distributed methods such as deep gradient compression and ternary gradients.

arxiv preprint arxiv, machine learning, natural language, (17 more...)

Neural Information Processing Systems

May-28-2025, 17:57:00 GMT

Conferences PDF

Add feedback

Country:
- North America > United States > Texas (0.14)

Genre:
- Research Report > New Finding (0.88)

Industry:
- Education (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks (1.00)
    - Statistical Learning (0.68)
  - Natural Language (1.00)
  - Representation & Reasoning (1.00)
  - Vision (0.92)