LLMEasyQuant -- An Easy to Use Toolkit for LLM Quantization

Liu, Dong, Jiang, Meng, Pister, Kaiser

arXiv.org Artificial Intelligence 

Quantization is the process of mapping a large set of input values to a smaller set of output values, often integers. It is a key technique in digital signal processing where continuous signals are mapped to discrete digital values, and it reduces the data's precision to make storage and computation more efficient while attempting to retain essential information. With the development of Large Language Models (LLMs), the models have grown extremely large, so the memory usage and inference speed are greatly limited by the size of the model. Consequently, as one of the most popular technique for model compression, quantization has many variants now used for LLM compression and inference acceleration. The goal of quantization in LLMs is to reduce their size while minimizing its influence on inference speed.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found