Compressing Large Language Models using Low Rank and Low Precision Decomposition

Neural Information Processing Systems 

Due to the correlated nature of language syntax and semantics learned during training, often, the weight matrices of LLMs exhibit redundancy, which manifests as a low-rank structure. This redundancy suggests the potential for compression without substantial loss in performance.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found