Improved Quantization Strategies for Managing Heavy-tailed Gradients in Distributed Learning
Yan, Guangfeng, Li, Tan, Xiao, Yuanzhang, Hou, Hanxu, Song, Linqi
–arXiv.org Artificial Intelligence
Gradient compression has surfaced as a key technique to address the challenge of communication efficiency in distributed learning. In distributed deep learning, however, it is observed that gradient distributions are heavy-tailed, with outliers significantly influencing the design of compression strategies. Existing parameter quantization methods experience performance degradation when this heavy-tailed feature is ignored. In this paper, we introduce a novel compression scheme specifically engineered for heavy-tailed gradients, which effectively combines gradient truncation with quantization. This scheme is adeptly implemented within a communication-limited distributed Stochastic Gradient Descent (SGD) framework. We consider a general family of heavy-tail gradients that follow a power-law distribution, we aim to minimize the error resulting from quantization, thereby determining optimal values for two critical parameters: the truncation threshold and the quantization density. We provide a theoretical analysis on the convergence error bound under both uniform and non-uniform quantization scenarios. Comparative experiments with other benchmarks demonstrate the effectiveness of our proposed method in managing the heavy-tailed gradients in a distributed learning environment.
arXiv.org Artificial Intelligence
Feb-2-2024
- Country:
- North America > United States (0.14)
- Genre:
- Research Report (0.50)
- Industry:
- Education (0.35)
- Technology: