Cost-Efficient LLM Serving in the Cloud: VM Selection with KV Cache Offloading

Open in new window