PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression

Neural Information Processing Systems 

There has been significant interest in "extreme" compression of large language models (LLMs), i.e., to 1-2 bits per parameter, which allows such models to be executed efficiently on resource-constrained devices.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found