GANQ: GPU-Adaptive Non-Uniform Quantization for Large Language Models

Open in new window