Adaptive Gradient Quantization for Data-Parallel SGD

Neural Information Processing Systems 

These schemes are often heuristic and fixed over the course of training.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found