KARE-RAG: Knowledge-Aware Refinement and Enhancement for RAG

Li, Yongjian, Chu, HaoCheng, Yan, Yukun, Liu, Zhenghao, Yu, Shi, Zeng, Zheni, Wang, Ruobing, Song, Sen, Liu, Zhiyuan, Sun, Maosong

Jun-4-2025–arXiv.org Artificial Intelligence

Retrieval-Augmented Generation (RAG) enables large language models (LLMs) to access broader knowledge sources, yet factual inconsistencies persist due to noise in retrieved documents-even with advanced retrieval methods. We demonstrate that enhancing generative models' capacity to process noisy content is equally critical for robust performance. In this paper, we present KARE-RAG (Knowledge-Aware Refinement and Enhancement for RAG), which improves knowledge utilization through three key innovations: (1) structured knowledge representations that facilitate error detection during training, (2) Dense Direct Preference Optimization (DDPO)-a refined training objective that prioritizes correction of critical errors, and (3) a contrastive data generation pipeline that maintains semantic consistency while rectifying factual inaccuracies. Experiments show our method significantly enhances standard RAG pipelines across model scales, improving both in-domain and out-of-domain task performance without compromising general capabilities. Notably, these gains are achieved with modest training data, suggesting data-efficient optimization is possible through targeted learning strategies. Our findings establish a new direction for RAG improvement: by improving how models learn to process retrieved content, we can enhance performance across diverse inference paradigms. All data and code will be publicly available on Github.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

Jun-4-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.46)

Genre:
- Research Report > New Finding (0.66)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Cognitive Science > Problem Solving (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.90)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found