InRank: Incremental Low-Rank Learning

Zhao, Jiawei, Zhang, Yifei, Chen, Beidi, Schäfer, Florian, Anandkumar, Anima

Dec-31-2023–arXiv.org Artificial Intelligence

The theory of greedy low-rank learning (GLRL) aims to explain the impressive generalization capabilities of deep learning. It proves that stochastic gradient-based training implicitly regularizes neural networks towards low-rank solutions through a gradual increase of the rank during training. However, there is a gap between theory and practice since GLRL requires an infinitesimal initialization of the weights, which is not practical due to the fact that it is a saddle point. In this work, we remove the assumption of infinitesimal initialization by focusing on cumulative weight updates. We prove the cumulative weight updates follow an incremental low-rank trajectory for arbitrary orthogonal initialization of weights in a three-layer linear network. Empirically, we demonstrate that our theory holds on a broad range of neural networks (e.g., transformers) and standard training algorithms (e.g., SGD, Adam). However, existing training algorithms do not exploit the low-rank property to improve computational efficiency as the networks are not parameterized in low-rank. To remedy this, we design a new training algorithm Incremental Low-Rank Learning (InRank), which explicitly expresses cumulative weight updates as low-rank matrices while incrementally augmenting their ranks during training. We evaluate InRank on GPT-2, and our results indicate that InRank achieves comparable prediction performance as the full-rank counterpart while requiring at most 33% of the total ranks throughout training. We also propose an efficient version of InRank that achieves a reduction of 37% in total training time and 36% in model size when training GPT-medium on WikiText-103 from scratch.

initialization, inrank, singular vector, (13 more...)

arXiv.org Artificial Intelligence

Dec-31-2023

arXiv.org PDF

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States
  - California (0.04)
  - Wisconsin > Dane County
    - Madison (0.04)
  - Washington > King County
    - Seattle (0.04)
  - Pennsylvania > Allegheny County
    - Pittsburgh (0.04)

Genre:
- Research Report > New Finding (0.48)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)