PISCO: Pretty Simple Compression for Retrieval-Augmented Generation

Louis, Maxime, Déjean, Hervé, Clinchant, Stéphane

Jan-27-2025–arXiv.org Artificial Intelligence

Retrieval-Augmented Generation (RAG) pipelines enhance Large Language Models (LLMs) by retrieving relevant documents, but they face scalability issues due to high inference costs and limited context size. Document compression is a practical solution, but current soft compression methods suffer from accuracy losses and require extensive pretraining. In this paper, we introduce PISCO, a novel method that achieves a 16x compression rate with minimal accuracy loss (0-3%) across diverse RAG-based question-answering (QA) tasks. Unlike existing approaches, PISCO requires no pretraining or annotated data, relying solely on sequence-level knowledge distillation from document-based questions. With the ability to fine-tune a 7-10B LLM in 48 hours on a single A100 GPU, PISCO offers a highly efficient and scalable solution. We present comprehensive experiments showing that PISCO outperforms existing compression models by 8% in accuracy.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Jan-27-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Spain (0.04)
- South America
  - Peru (0.04)
  - Chile (0.04)
- North America > United States
  - California (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report > Promising Solution (0.34)

Industry:
- Education (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found