GRIFFIN: Effective Token Alignment for Faster Speculative Decoding

Neural Information Processing Systems 

Speculative decoding accelerates inference in large language models (LLMs) by generating multiple draft tokens simultaneously.