Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism

Open in new window