RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
Jin, Zhuoran, Yuan, Hongbang, Men, Tianyi, Cao, Pengfei, Chen, Yubo, Liu, Kang, Zhao, Jun
–arXiv.org Artificial Intelligence
Despite the significant progress made by existing retrieval augmented language models (RALMs) in providing trustworthy responses and grounding in reliable sources, they often overlook effective alignment with human preferences. In the alignment process, reward models (RMs) act as a crucial proxy for human values to guide optimization. However, it remains unclear how to evaluate and select a reliable RM for preference alignment in RALMs. To this end, we propose RAG-RewardBench, the first benchmark for evaluating RMs in RAG settings. First, we design four crucial and challenging RAG-specific scenarios to assess RMs, including multi-hop reasoning, fine-grained citation, appropriate abstain, and conflict robustness. Then, we incorporate 18 RAG subsets, six retrievers, and 24 RALMs to increase the diversity of data sources. Finally, we adopt an LLM-as-a-judge approach to improve preference annotation efficiency and effectiveness, exhibiting a strong correlation with human annotations. Based on the RAG-RewardBench, we conduct a comprehensive evaluation of 45 RMs and uncover their limitations in RAG scenarios. Additionally, we also reveal that existing trained RALMs show almost no improvement in preference alignment, highlighting the need for a shift towards preference-aligned training.We release our benchmark and code publicly at https://huggingface.co/datasets/jinzhuoran/RAG-RewardBench/ for future work.
arXiv.org Artificial Intelligence
Dec-18-2024
- Country:
- Asia
- British Indian Ocean Territory > Diego Garcia (0.04)
- China > Beijing
- Beijing (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Myanmar > Tanintharyi Region
- Dawei (0.04)
- Philippines (0.04)
- Singapore (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Atlantic Ocean > North Atlantic Ocean
- North Sea (0.04)
- Europe
- Austria > Vienna (0.14)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Italy > Calabria
- Catanzaro Province > Catanzaro (0.04)
- Middle East > Malta (0.04)
- North Sea (0.04)
- Switzerland (0.04)
- United Kingdom
- North America
- Canada
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Ontario > Toronto (0.04)
- British Columbia > Metro Vancouver Regional District
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- Pennsylvania (0.04)
- Louisiana
- Orleans Parish > New Orleans (0.04)
- Ouachita Parish > Monroe (0.04)
- Florida > Miami-Dade County
- Miami (0.14)
- Georgia (0.04)
- Michigan (0.05)
- California > Los Angeles County
- Long Beach (0.04)
- North Carolina (0.04)
- Arizona (0.04)
- New York (0.04)
- Wisconsin (0.04)
- Nevada (0.04)
- Canada
- Pacific Ocean > North Pacific Ocean
- Philippine Sea (0.04)
- Asia
- Genre:
- Research Report (0.50)
- Industry:
- Government
- Military > Navy (0.68)
- Regional Government > North America Government
- United States Government (1.00)
- Voting & Elections (0.68)
- Information Technology (0.67)
- Leisure & Entertainment (1.00)
- Media (0.67)
- Transportation (0.93)
- Government
- Technology: