Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding

Open in new window