Reviews: GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training
–Neural Information Processing Systems
This describes how a PCA during full-vector can be used to predict a good compression method. They do a good job arguing that PCA among gradient vectors should be a common case in machine learning. On the down side, it requires a fair amount of coding work to include it in what you do, because you still have a periodic "full gradient" phase. The PCA and how it is approximated are practical heuristics, so I don't expect a proof to be possible without a bit of fine print. I don't think there was a discussion of what happens with rare classes.
Neural Information Processing Systems
Oct-8-2024, 04:41:19 GMT
- Technology: