From Tokens to Layers: Redefining Stall-Free Scheduling for LLM Serving with Layered Prefill

Open in new window