Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge

Open in new window