Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding

Open in new window