Train for Truth, Keep the Skills: Binary Retrieval-Augmented Reward Mitigates Hallucinations
Chen, Tong, Asai, Akari, Zettlemoyer, Luke, Hajishirzi, Hannaneh, Brahman, Faeze
–arXiv.org Artificial Intelligence
Language models often generate factually incorrect information unsupported by their training data, a phenomenon known as extrinsic hallucination. Existing mitigation approaches often degrade performance on open-ended generation and downstream tasks, limiting their practical utility. We propose an online reinforcement learning method using a novel binary retrieval-augmented reward (RAR) to address this tradeoff. Unlike continuous reward schemes, our approach assigns a reward of one only when the model's output is entirely factually correct, and zero otherwise. We evaluate our method on Qwen3 reasoning models across diverse tasks. For open-ended generation, binary RAR achieves a 39.3% reduction in hallucination rates, substantially outperforming both supervised training and continuous-reward RL baselines. In short-form question answering, the model learns calibrated abstention, strategically outputting "I don't know" when faced with insufficient parametric knowledge. This yields 44.4% and 21.7% fewer incorrect answers on PopQA and GPQA, respectively. Crucially, these factuality gains come without performance degradation on instruction following, math, or code, whereas continuous-reward RL, despite improving factuality, induces quality regressions.
arXiv.org Artificial Intelligence
Oct-21-2025
- Country:
- Asia
- Europe > Austria
- Vienna (0.14)
- North America
- Canada > Ontario
- Toronto (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- Connecticut (0.04)
- Florida > Miami-Dade County
- Miami (0.04)
- New Mexico > Bernalillo County
- Albuquerque (0.04)
- Rhode Island (0.04)
- Canada > Ontario
- Genre:
- Research Report (0.83)
- Technology: