Draft Model Knows When to Stop: A Self-Verification Length Policy for Speculative Decoding

Open in new window