GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training

Yu, Mingchao, Lin, Zhifeng, Narra, Krishna, Li, Songze, Li, Youjie, Kim, Nam Sung, Schwing, Alexander, Annavaram, Murali, Avestimehr, Salman

Feb-14-2020, 16:13:30 GMT–Neural Information Processing Systems

Data parallelism can boost the training speed of convolutional neural networks (CNN), but could suffer from significant communication costs caused by gradient aggregation. To alleviate this problem, several scalar quantization techniques have been developed to compress the gradients. But these techniques could perform poorly when used together with decentralized aggregation protocols like ring all-reduce (RAR), mainly due to their inability to directly aggregate compressed gradients. In this paper, we empirically demonstrate the strong linear correlations between CNN gradients, and propose a gradient vector quantization technique, named GradiVeQ, to exploit these correlations through principal component analysis (PCA) for substantial gradient dimension reduction. GradiveQ enables direct aggregation of compressed gradients, hence allows us to build a distributed learning system that parallelizes GradiveQ gradient compression and RAR communications.

bandwidth-efficient gradient aggregation, quantization technique, vector quantization, (4 more...)

Neural Information Processing Systems

Feb-14-2020, 16:13:30 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.63)