QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache

Open in new window