ErrorCompensatedX: error compensation for variance reduced algorithms

Jan-17-2025, 18:43:36 GMT–Neural Information Processing Systems

Communication cost is one major bottleneck for the scalability for distributed learning. One approach to reduce the communication cost is to compress the gradient during communication. However, directly compressing the gradient decelerates the convergence speed, and the resulting algorithm may diverge for biased compression. Recent work addressed this problem for stochastic gradient descent by adding back the compression error from the previous step. This idea was further extended to one class of variance reduced algorithms, where the variance of the stochastic gradient is reduced by taking a moving average over all history gradients.

algorithm, error compensation, variance, (4 more...)

Neural Information Processing Systems

Jan-17-2025, 18:43:36 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.89)