Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding

Open in new window