RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
Jin, Zhuoran, Yuan, Hongbang, Men, Tianyi, Cao, Pengfei, Chen, Yubo, Liu, Kang, Zhao, Jun
–arXiv.org Artificial Intelligence
Despite the significant progress made by existing retrieval augmented language models (RALMs) in providing trustworthy responses and grounding in reliable sources, they often overlook effective alignment with human preferences. In the alignment process, reward models (RMs) act as a crucial proxy for human values to guide optimization. However, it remains unclear how to evaluate and select a reliable RM for preference alignment in RALMs. To this end, we propose RAG-RewardBench, the first benchmark for evaluating RMs in RAG settings. First, we design four crucial and challenging RAG-specific scenarios to assess RMs, including multi-hop reasoning, fine-grained citation, appropriate abstain, and conflict robustness. Then, we incorporate 18 RAG subsets, six retrievers, and 24 RALMs to increase the diversity of data sources. Finally, we adopt an LLM-as-a-judge approach to improve preference annotation efficiency and effectiveness, exhibiting a strong correlation with human annotations. Based on the RAG-RewardBench, we conduct a comprehensive evaluation of 45 RMs and uncover their limitations in RAG scenarios. Additionally, we also reveal that existing trained RALMs show almost no improvement in preference alignment, highlighting the need for a shift towards preference-aligned training.We release our benchmark and code publicly at https://huggingface.co/datasets/jinzhuoran/RAG-RewardBench/ for future work.
arXiv.org Artificial Intelligence
Dec-18-2024
- Country:
- Pacific Ocean > North Pacific Ocean
- Philippine Sea (0.04)
- North America
- United States
- Michigan (0.05)
- New York (0.04)
- North Carolina (0.04)
- Nevada (0.04)
- Arizona (0.04)
- Pennsylvania (0.04)
- Wisconsin (0.04)
- Georgia (0.04)
- California > Los Angeles County
- Long Beach (0.04)
- Florida > Miami-Dade County
- Miami (0.14)
- Louisiana
- Ouachita Parish > Monroe (0.04)
- Orleans Parish > New Orleans (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- Canada
- Ontario > Toronto (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- United States
- Europe
- Austria > Vienna (0.14)
- Switzerland (0.04)
- North Sea (0.04)
- Middle East > Malta (0.04)
- United Kingdom
- Italy > Calabria
- Catanzaro Province > Catanzaro (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Atlantic Ocean > North Atlantic Ocean
- North Sea (0.04)
- Asia
- Singapore (0.04)
- Philippines (0.04)
- British Indian Ocean Territory > Diego Garcia (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Myanmar > Tanintharyi Region
- Dawei (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- China > Beijing
- Beijing (0.04)
- Pacific Ocean > North Pacific Ocean
- Genre:
- Research Report (0.50)
- Industry:
- Leisure & Entertainment (1.00)
- Transportation (0.93)
- Media (0.67)
- Information Technology (0.67)
- Government
- Voting & Elections (0.68)
- Military > Navy (0.68)
- Regional Government > North America Government
- United States Government (1.00)
- Technology: