LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism

Open in new window