Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation
Dong, Guanting, Zhu, Yutao, Zhang, Chenghao, Wang, Zechen, Dou, Zhicheng, Wen, Ji-Rong
–arXiv.org Artificial Intelligence
Retrieval-augmented generation (RAG) has demonstrated effectiveness in mitigating the hallucination problem of large language models (LLMs). However, the difficulty of aligning the retriever with the diverse LLMs' knowledge preferences inevitably poses an inevitable challenge in developing a reliable RAG system. To address this issue, we propose DPA-RAG, a universal framework designed to align diverse knowledge preferences within RAG systems. Specifically, we initially introduce a preference knowledge construction pipline and incorporate five novel query augmentation strategies to alleviate preference data scarcity. Based on preference data, DPA-RAG accomplishes both external and internal preference alignment: 1) It jointly integrate pair-wise, point-wise, and contrastive preference alignment abilities into the reranker, achieving external preference alignment among RAG components. 2) It further introduces a pre-aligned stage before vanilla Supervised Fine-tuning (SFT), enabling LLMs to implicitly capture knowledge aligned with their reasoning preferences, achieving LLMs' internal alignment. Experimental results across four knowledge-intensive QA datasets demonstrate that DPA-RAG outperforms all baselines and seamlessly integrates both black-box and open-sourced LLM readers. Further qualitative analysis and discussions also provide empirical guidance for achieving reliable RAG systems. Our code is publicly available at https://github.com/dongguanting/DPA-RAG.
arXiv.org Artificial Intelligence
Jun-26-2024
- Country:
- Africa
- Asia
- China
- Beijing > Beijing (0.04)
- Chongqing Province > Chongqing (0.04)
- Fujian Province > Xiamen (0.04)
- Hong Kong (0.04)
- Shanghai > Shanghai (0.04)
- India (0.04)
- Indonesia > Bali (0.04)
- Middle East
- Jordan (0.04)
- Republic of Türkiye > Istanbul Province
- Istanbul (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Myanmar > Tanintharyi Region
- Dawei (0.04)
- Russia
- Singapore (0.04)
- Taiwan > Taiwan Province
- Taipei (0.04)
- China
- Atlantic Ocean > Mediterranean Sea
- Aegean Sea > Sea of Marmara > Bosporus (0.04)
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- France > Auvergne-Rhône-Alpes
- Russia
- Ukraine > Crimea (0.04)
- United Kingdom > England
- West Midlands > Birmingham (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Netherlands
- North Brabant (0.04)
- North Holland > Amsterdam (0.04)
- South Holland
- Germany > Berlin (0.04)
- Belgium > Brussels-Capital Region
- North America
- Canada
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.28)
- Ontario > Toronto (0.04)
- Quebec > Montreal (0.04)
- British Columbia > Metro Vancouver Regional District
- Mexico > Mexico City
- Mexico City (0.04)
- Puerto Rico > San Juan
- San Juan (0.04)
- United States
- Georgia > Fulton County
- Atlanta (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- New York (0.04)
- Washington > King County
- Seattle (0.14)
- Georgia > Fulton County
- Canada
- South America > Brazil
- São Paulo (0.04)
- Genre:
- Research Report > New Finding (0.67)
- Industry:
- Leisure & Entertainment (1.00)
- Media > Film (1.00)
- Technology: