A Theoretical Perspective for Speculative Decoding Algorithm Ming Yin
–Neural Information Processing Systems
Transformer-based autoregressive sampling has been the major bottleneck for slowing down large language model inferences.
Neural Information Processing Systems
Oct-10-2025, 19:59:47 GMT