LLMEasyQuant -- An Easy to Use Toolkit for LLM Quantization
Liu, Dong, Jiang, Meng, Pister, Kaiser
–arXiv.org Artificial Intelligence
Quantization is the process of mapping a large set of input values to a smaller set of output values, often integers. It is a key technique in digital signal processing where continuous signals are mapped to discrete digital values, and it reduces the data's precision to make storage and computation more efficient while attempting to retain essential information. With the development of Large Language Models (LLMs), the models have grown extremely large, so the memory usage and inference speed are greatly limited by the size of the model. Consequently, as one of the most popular technique for model compression, quantization has many variants now used for LLM compression and inference acceleration. The goal of quantization in LLMs is to reduce their size while minimizing its influence on inference speed.
arXiv.org Artificial Intelligence
Jul-2-2024