SpecServe: Efficient and SLO-Aware Large Language Model Serving with Adaptive Speculative Decoding

Open in new window