SpecExec: Massively Parallel Speculative Decoding For Interactive LLM Inference on Consumer Devices

Open in new window