Accelerating Production LLMs with Combined Token/Embedding Speculators

Open in new window