Speculative Decoding with CTC-based Draft Model for LLM Inference Acceleration

Neural Information Processing Systems 

Therefore, it has a high decoding speed but an unsatisfactory acceptance rate.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found