Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction

Open in new window