LLavaCode: Compressed Code Representations for Retrieval-Augmented Code Generation

Cherniuk, Daria, Sukhorukov, Nikita, Sushko, Nikita, Gusak, Daniil, Sivtsov, Danil, Tutubalina, Elena, Frolov, Evgeny

Oct-23-2025–arXiv.org Artificial Intelligence

Retrieval-augmented generation has emerged as one of the most effective approaches for code completion, particularly when context from a surrounding repository is essential. However, incorporating context significantly extends sequence length, leading to slower inference - a critical limitation for interactive settings such as IDEs. In this work, we introduce LlavaCode, a framework that compresses code into compact, semantically rich representations interpretable by code LLM, enhancing generation quality while reducing the retrieved context to only a few compressed single-token vectors. Using a small projector module we can significantly increase the EM and ES metrics of coding model with negligible latency increase. Our experiments demonstrate that compressed context enables 20-38% reduction in Time-to-First-Token (TTFT) on line completion tasks compared to full-RAG pipelines.

large language model, machine learning, qwen2, (20 more...)

arXiv.org Artificial Intelligence

Oct-23-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Russia (0.15)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.88)