Don't Compress Gradients in Random Reshuffling: Compress Gradient Differences

May-31-2025, 14:41:43 GMT–Neural Information Processing Systems

Gradient compression is a popular technique for improving communication complexity of stochastic first-order methods in distributed training of machine learning models. However, the existing works consider only with-replacement sampling of stochastic gradients. In contrast, it is well-known in practice and recently confirmed in theory that stochastic methods based on without-replacement sampling, e.g., Random Reshuffling (

machine learning, natural language, qsgd-rr, (17 more...)

Neural Information Processing Systems

May-31-2025, 14:41:43 GMT

Conferences PDF

Add feedback

Country:
- Europe (0.67)
- North America > Canada
  - Ontario > Toronto (0.14)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (1.00)

Industry:
- Information Technology (0.45)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (0.93)
    - Statistical Learning (1.00)
  - Natural Language (1.00)
  - Representation & Reasoning (1.00)
  - Vision (1.00)