Stateful Large Language Model Serving with Pensieve

Dec-9-2023–arXiv.org Artificial Intelligence

Existing LLM serving systems are stateless across In the conversational setup, the user and the chatbot are requests. Consequently, when LLMs are used in the common engaged in a dialogue that may last many rounds. In order setting of multi-turn conversations, a growing log of the conversation for the chatbot not to "lose memory" of what has been said so history must be processed alongside any request far when responding, the cumulative history of the dialogue by the serving system at each turn, resulting in repeated must be part of the context for LLM's autoregressive generation.

kernel, kv-token, pensieve, (15 more...)

arXiv.org Artificial Intelligence

Dec-9-2023

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Australian Capital Territory > Canberra (0.04)
- North America > United States
  - New York > New York County > New York City (0.04)
- Europe > Italy
  - Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)