Accelerating Retrieval-Augmented Language Model Serving with Speculation

Open in new window