A Theoretical Perspective for Speculative Decoding Algorithm Ming Yin
–Neural Information Processing Systems
Transformer-based autoregressive sampling has been the major bottleneck for slowing down large language model inferences.
Neural Information Processing Systems
Nov-20-2025, 06:03:35 GMT