Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exiting

Open in new window