Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

Open in new window