Bi'an: A Bilingual Benchmark and Model for Hallucination Detection in Retrieval-Augmented Generation

Jiang, Zhouyu, Sun, Mengshu, Zhang, Zhiqiang, Liang, Lei

Feb-26-2025–arXiv.org Artificial Intelligence

Retrieval-Augmented Generation (RAG) effectively reduces hallucinations in Large Language Models (LLMs) but can still produce inconsistent or unsupported content. Although LLM-as-a-Judge is widely used for RAG hallucination detection due to its implementation simplicity, it faces two main challenges: the absence of comprehensive evaluation benchmarks and the lack of domain-optimized judge models. To bridge these gaps, we introduce \textbf{Bi'an}, a novel framework featuring a bilingual benchmark dataset and lightweight judge models. The dataset supports rigorous evaluation across multiple RAG scenarios, while the judge models are fine-tuned from compact open-source LLMs. Extensive experimental evaluations on Bi'anBench show our 14B model outperforms baseline models with over five times larger parameter scales and rivals state-of-the-art closed-source LLMs. We will release our data and models soon at https://github.com/OpenSPG/KAG.

computational linguistic, hallucination detection, proceedings, (13 more...)

arXiv.org Artificial Intelligence

Feb-26-2025

arXiv.org PDF

Add feedback

Country:
- North America
  - Dominican Republic (0.04)
  - United States
    - Washington > King County
      - Seattle (0.04)
    - New Mexico > Bernalillo County
      - Albuquerque (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Florida > Miami-Dade County
      - Miami (0.04)
  - Mexico > Mexico City
    - Mexico City (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Europe
  - Moldova (0.04)
  - Spain (0.04)
  - Portugal (0.04)
  - Middle East > Malta (0.04)
  - Germany (0.04)
  - Romania > București - Ilfov Development Region
    - Municipality of Bucharest > Bucharest (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - France > Provence-Alpes-Côte d'Azur
    - Bouches-du-Rhône > Marseille (0.04)
- Asia
  - Singapore (0.04)
  - India (0.04)
  - Thailand > Bangkok
    - Bangkok (0.04)
  - Myanmar > Tanintharyi Region
    - Dawei (0.04)
  - Middle East
    - Jordan (0.04)
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.04)
  - Malaysia > Melaka
    - Malacca (0.05)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found