Hierarchical Verification of Speculative Beams for Accelerating LLM Inference

Open in new window