bit Shampoo for Memory-Efficient Network Training

May-25-2025, 19:52:38 GMT–Neural Information Processing Systems

Second-order optimizers, maintaining a matrix termed a preconditioner, are superior to first-order optimizers in both theory and practice. The states forming the preconditioner and its inverse root restrict the maximum size of models trained by second-order optimizers. To address this, compressing 32-bit optimizer states to lower bitwidths has shown promise in reducing memory usage.

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

May-25-2025, 19:52:38 GMT

Conferences PDF

Add feedback

Country:
- Asia
  - China (0.14)
  - Singapore (0.14)

Genre:
- Research Report > Experimental Study (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (0.93)
  - Representation & Reasoning (0.67)
  - Vision (1.00)