SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference

Open in new window