Scaling LLM Speculative Decoding: Non-Autoregressive Forecasting in Large-Batch Scenarios

Open in new window