Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference

Open in new window