SpecInfer: Accelerating Generative Large Language Model Serving with Speculative Inference and Token Tree Verification

Open in new window