A Theoretical Perspective for Speculative Decoding Algorithm Ming Yin

Neural Information Processing Systems 

Transformer-based autoregressive sampling has been the major bottleneck for slowing down large language model inferences.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found