LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

Neural Information Processing Systems 

To cope with these features, we develop a two-part quantization procedure, LLM.int8() .

Similar Docs  Excel Report  more

TitleSimilaritySource
None found