LLM Inference Serving: Survey of Recent Advances and Opportunities

Li, Baolin, Jiang, Yankai, Gadepally, Vijay, Tiwari, Devesh

Jul-17-2024–arXiv.org Artificial Intelligence

This survey offers a comprehensive overview of recent advancements in Large Language Model (LLM) serving systems, focusing on research since the year 2023. We specifically examine system-level enhancements that improve performance and efficiency without altering the core LLM decoding mechanisms. By selecting and reviewing high-quality papers from prestigious ML and system venues, we highlight key innovations and practical considerations for deploying and scaling LLMs in real-world production environments. This survey serves as a valuable resource for LLM practitioners seeking to stay abreast of the latest developments in this rapidly evolving field.

arxiv preprint arxiv, inference, language model, (14 more...)

arXiv.org Artificial Intelligence

Jul-17-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > California > Santa Clara County > Santa Clara (0.04)

Genre:
- Overview (1.00)

Industry:
- Information Technology > Services (0.46)
- Government > Regional Government (0.46)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found