On multi-token prediction for efficient LLM inference

Open in new window