MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents

Tang, Liyan, Laban, Philippe, Durrett, Greg

Apr-16-2024–arXiv.org Artificial Intelligence

Recognizing if LLM output can be grounded in evidence is central to many tasks in NLP: retrieval-augmented generation, summarization, document-grounded dialogue, and more. Current approaches to this kind of "fact-checking" are based on verifying each piece of a model generation against potential evidence using an LLM. However, this process can be very computationally expensive, requiring many calls to LLMs to check a single response. In this work, we show how to build small models that have GPT-4-level performance but for 400x lower cost. We do this by constructing synthetic training data with GPT-4, which involves creating realistic yet challenging instances of factual errors via a structured generation procedure. Training on this data teaches models to check each fact in the claim and recognize synthesis of information across sentences. For evaluation, we unify pre-existing datasets into a benchmark LLM-AggreFact, collected from recent work on fact-checking and grounding LLM generations. Our best system MiniCheck-FT5 (770M parameters) outperforms all systems of comparable size and reaches GPT-4 accuracy. We release LLM-AggreFact, code for data synthesis, and models.

computational linguistic, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

Apr-16-2024

arXiv.org PDF

Add feedback

Country:
- Europe (1.00)
- North America > United States
  - Louisiana (0.14)
  - Texas (0.14)

Genre:
- Research Report > New Finding (0.92)

Industry:
- Banking & Finance > Economy (1.00)
- Government (1.00)
- Leisure & Entertainment (1.00)
- Media > Film (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found