VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs

Open in new window