GRIFFIN: Effective Token Alignment for Faster Speculative Decoding

Neural Information Processing Systems 

Speculative decoding accelerates inference in large language models (LLMs) by generating multiple draft tokens simultaneously.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found