Adaptive Gradient Quantization for Data-Parallel SGD

Faghri, Fartash, Tabrizian, Iman, Markov, Ilia, Alistarh, Dan, Roy, Daniel, Ramezani-Kebrya, Ali

Oct-23-2020–arXiv.org Machine Learning

These schemes are often heuristic and fixed over the course of training. We empirically observe that the statistics of gradients of deep models change during the training. Motivated by this observation, we introduce two adaptive quantization schemes, ALQ and AMQ. In both schemes, processors update their compression schemes in parallel by efficiently computing sufficient statistics of a parametric distribution. We improve the validation accuracy by almost 2% on CIFAR-10 and 1% on ImageNet in challenging low-cost communication setups. Our adaptive methods are also significantly more robust to the choice of hyperparameters.

artificial intelligence, machine learning, variance, (21 more...)

arXiv.org Machine Learning

Oct-23-2020

arXiv.org PDF

Add feedback

Country:
- Asia > Russia (0.04)
- North America > Canada
  - Ontario > Toronto (0.14)
  - British Columbia > Metro Vancouver Regional District
    - Vancouver (0.04)
- Europe
  - Russia (0.04)
  - Austria (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Italy > Calabria
    - Catanzaro Province > Catanzaro (0.04)

Genre:
- Research Report (0.81)

Industry:
- Government (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (0.68)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found