Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization

Open in new window