Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference

Open in new window