Improved Quantization Strategies for Managing Heavy-tailed Gradients in Distributed Learning

Yan, Guangfeng, Li, Tan, Xiao, Yuanzhang, Hou, Hanxu, Song, Linqi

Feb-2-2024–arXiv.org Artificial Intelligence

Gradient compression has surfaced as a key technique to address the challenge of communication efficiency in distributed learning. In distributed deep learning, however, it is observed that gradient distributions are heavy-tailed, with outliers significantly influencing the design of compression strategies. Existing parameter quantization methods experience performance degradation when this heavy-tailed feature is ignored. In this paper, we introduce a novel compression scheme specifically engineered for heavy-tailed gradients, which effectively combines gradient truncation with quantization. This scheme is adeptly implemented within a communication-limited distributed Stochastic Gradient Descent (SGD) framework. We consider a general family of heavy-tail gradients that follow a power-law distribution, we aim to minimize the error resulting from quantization, thereby determining optimal values for two critical parameters: the truncation threshold and the quantization density. We provide a theoretical analysis on the convergence error bound under both uniform and non-uniform quantization scenarios. Comparative experiments with other benchmarks demonstrate the effectiveness of our proposed method in managing the heavy-tailed gradients in a distributed learning environment.

artificial intelligence, machine learning, quantization, (15 more...)

arXiv.org Artificial Intelligence

Feb-2-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.14)

Genre:
- Research Report (0.50)

Industry:
- Education (0.35)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.49)
  - Statistical Learning > Gradient Descent (0.54)