Efficient Speculative Decoding for Llama at Scale: Challenges and Solutions

Open in new window