EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
–Neural Information Processing Systems
The sequential nature of modern LLMs makes them expensive and slow, and speculative sampling has proven to be an effective solution to this problem. Methods like EAGLE perform autoregression at the feature level, reusing top-layer features from the target model to achieve better results than vanilla speculative sampling. A growing trend in the LLM community is scaling up training data to improve model intelligence without increasing inference costs. However, we observe that scaling up data provides limited improvements for EAGLE. We identify that this limitation arises from EAGLE's feature prediction constraints.
Neural Information Processing Systems
Jun-22-2026, 17:43:15 GMT
- Country:
- North America > United States (0.14)
- Genre:
- Research Report > Experimental Study (1.00)
- Technology: