KARE-RAG: Knowledge-Aware Refinement and Enhancement for RAG
Li, Yongjian, Chu, HaoCheng, Yan, Yukun, Liu, Zhenghao, Yu, Shi, Zeng, Zheni, Wang, Ruobing, Song, Sen, Liu, Zhiyuan, Sun, Maosong
–arXiv.org Artificial Intelligence
Retrieval-Augmented Generation (RAG) enables large language models (LLMs) to access broader knowledge sources, yet factual inconsistencies persist due to noise in retrieved documents-even with advanced retrieval methods. We demonstrate that enhancing generative models' capacity to process noisy content is equally critical for robust performance. In this paper, we present KARE-RAG (Knowledge-Aware Refinement and Enhancement for RAG), which improves knowledge utilization through three key innovations: (1) structured knowledge representations that facilitate error detection during training, (2) Dense Direct Preference Optimization (DDPO)-a refined training objective that prioritizes correction of critical errors, and (3) a contrastive data generation pipeline that maintains semantic consistency while rectifying factual inaccuracies. Experiments show our method significantly enhances standard RAG pipelines across model scales, improving both in-domain and out-of-domain task performance without compromising general capabilities. Notably, these gains are achieved with modest training data, suggesting data-efficient optimization is possible through targeted learning strategies. Our findings establish a new direction for RAG improvement: by improving how models learn to process retrieved content, we can enhance performance across diverse inference paradigms. All data and code will be publicly available on Github.
arXiv.org Artificial Intelligence
Jun-4-2025
- Country:
- Asia > China
- Beijing > Beijing (0.04)
- Fujian Province > Xiamen (0.04)
- Liaoning Province > Shenyang (0.04)
- Asia > China
- Genre:
- Research Report > New Finding (0.66)
- Technology: