Training Plug-n-Play Knowledge Modules with Deep Context Distillation

Caccia, Lucas, Ansell, Alan, Ponti, Edoardo, Vulić, Ivan, Sordoni, Alessandro

Mar-10-2025–arXiv.org Artificial Intelligence

Dynamically integrating new or rapidly evolving information after (Large) Language Model pre-training remains challenging, particularly in low-data scenarios or when dealing with private and specialized documents. In-context learning and retrieval-augmented generation (RAG) face limitations, including their high inference costs and their inability to capture global document information. In this paper, we propose a way of modularizing knowledge by training document-level Knowledge Modules (KMs). KMs are lightweight components implemented as parameter-efficient LoRA modules, which are trained to store information about new documents and can be easily plugged into models on demand. We show that next-token prediction performs poorly as the training objective for KMs. We instead propose Deep Context Distillation: we learn KMs parameters such as to simulate hidden states and logits of a teacher that takes the document in context. Our method outperforms standard next-token prediction and pre-instruction training techniques, across two datasets. Finally, we highlight synergies between KMs and retrieval-augmented generation.

arxiv preprint arxiv, context distillation, information, (12 more...)

arXiv.org Artificial Intelligence

Mar-10-2025

arXiv.org PDF

Add feedback

Country:
- North America
  - United States > Washington
    - King County > Seattle (0.04)
  - Canada > Quebec
    - Montreal (0.04)
- Europe
  - Austria > Vienna (0.14)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report (0.41)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.56)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found