Fast Inference from Transformers via Speculative Decoding

Open in new window