Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving

Open in new window