SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices

Open in new window