Gumiho: A Hybrid Architecture to Prioritize Early Tokens in Speculative Decoding

Open in new window