Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

Open in new window