Accelerating Large Language Model Decoding with Speculative Sampling

Open in new window