BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching

Open in new window