Recursive Speculative Decoding: Accelerating LLM Inference via Sampling Without Replacement

Open in new window