Accelerating Blockwise Parallel Language Models with Draft Refinement T aehyeon Kim

Neural Information Processing Systems 

First, we analyze token distributions generated across multiple prediction heads.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found