Adaptive Gradient Quantization for Data-Parallel SGD

Neural Information Processing Systems 

These schemes are often heuristic and fixed over the course of training.