AILS-NTUA at SemEval-2024 Task 6: Efficient model tuning for hallucination detection and analysis

Grigoriadou, Natalia, Lymperaiou, Maria, Filandrianos, Giorgos, Stamou, Giorgos

Apr-12-2024–arXiv.org Artificial Intelligence

In this paper, we present our team's submissions for SemEval-2024 Task-6 - SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes. The participants were asked to perform binary classification to identify cases of fluent overgeneration hallucinations. Our experimentation included fine-tuning a pre-trained model on hallucination detection and a Natural Language Inference (NLI) model. The most successful strategy involved creating an ensemble of these models, resulting in accuracy rates of 77.8% and 79.9% on model-agnostic and model-aware datasets respectively, outperforming the organizers' baseline and achieving notable results when contrasted with the top-performing results in the competition, which reported accuracies of 84.7% and 81.3% correspondingly.

hallucination, probability, validation, (14 more...)

arXiv.org Artificial Intelligence

Apr-12-2024

arXiv.org PDF

Add feedback

Country:
- North America > Mexico
  - Mexico City > Mexico City (0.04)
- Europe > Italy
  - Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia
  - Singapore (0.04)
  - Middle East > Republic of Türkiye (0.04)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Machine Translation (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found