NoMIRACL: Knowing When You Don't Know for Robust Multilingual Retrieval-Augmented Generation
Thakur, Nandan, Bonifacio, Luiz, Zhang, Xinyu, Ogundepo, Odunayo, Kamalloo, Ehsan, Alfonso-Hermelo, David, Li, Xiaoguang, Liu, Qun, Chen, Boxing, Rezagholizadeh, Mehdi, Lin, Jimmy
–arXiv.org Artificial Intelligence
Retrieval-augmented generation (RAG) grounds large language model (LLM) output by leveraging external knowledge sources to reduce factual hallucinations. However, prior works lack a comprehensive evaluation of different language families, making it challenging to evaluate LLM robustness against errors in external retrieved knowledge. To overcome this, we establish NoMIRACL, a human-annotated dataset for evaluating LLM robustness in RAG across 18 typologically diverse languages. NoMIRACL includes both a non-relevant and a relevant subset. Queries in the non-relevant subset contain passages manually judged as non-relevant or noisy, whereas queries in the relevant subset include at least a single judged relevant passage. We measure LLM robustness using two metrics: (i) hallucination rate, measuring model tendency to hallucinate an answer, when the answer is not present in passages in the non-relevant subset, and (ii) error rate, measuring model inaccuracy to recognize relevant passages in the relevant subset. We build a GPT-4 baseline which achieves a 33.2% hallucination rate on the non-relevant and a 14.9% error rate on the relevant subset on average. Our evaluation reveals that GPT-4 hallucinates frequently in high-resource languages, such as French or English. This work highlights an important avenue for future research to improve LLM robustness to learn how to better reject non-relevant information in RAG.
arXiv.org Artificial Intelligence
Dec-18-2023
- Country:
- South America > Brazil (0.04)
- Oceania > Australia
- North America
- Dominican Republic (0.04)
- United States
- Maryland (0.04)
- Washington > King County
- Seattle (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Canada > Ontario
- Toronto (0.04)
- Waterloo Region > Waterloo (0.04)
- Europe
- Sweden (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Asia
- Indonesia > Bali (0.04)
- China (0.04)
- Taiwan > Taiwan Province
- Taipei (0.04)
- Middle East
- Jordan (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Africa
- Genre:
- Research Report (0.64)
- Technology: