KunServe: Elastic and Efficient Large Language Model Serving with Parameter-centric Memory Management

Open in new window