Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exiting Fangcheng Liu Yehui Tang Zhenhua Liu Y unsheng Ni Duyu Tang Kai Han, Yunhe Wang

Neural Information Processing Systems 

To bridge the representation gap between the sub-network and the full model, we train a lightweight and efficient adapter module on top of the sub-network.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found