QUIK: Towards End-to-End 4-Bit Inference on Generative Large Language Models

Open in new window