Adaptive Gradient Quantization for Data-Parallel SGD