Scalable and Robust Speculative Decoding
–Neural Information Processing Systems
As the usage of large language models (LLMs) grows, it becomes increasingly important to serve them quickly and efficiently. While speculative decoding has recently emerged as a promising direction for accelerating LLM serving, existing methods are limited in their ability to scale to larger speculation budgets and adapt to different hyperparameters.
Neural Information Processing Systems
Mar-27-2025, 13:27:10 GMT
- Country:
- North America > United States > Hawaii (0.14)
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.93)
- Research Report
- Technology: