Do Language Models Know When They're Hallucinating References?

Agrawal, Ayush, Suzgun, Mirac, Mackey, Lester, Kalai, Adam Tauman

Sep-13-2023–arXiv.org Artificial Intelligence

State-of-the-art language models (LMs) are famous for "hallucinating" references. These fabricated article and book titles lead to harms, obstacles to their use, and public backlash. While other types of LM hallucinations are also important, we propose hallucinated references as the "drosophila" of research on hallucination in large language models (LLMs), as they are particularly easy to study. We show that simple search engine queries reliably identify such hallucinations, which facilitates evaluation. To begin to dissect the nature of hallucinated LM references, we attempt to classify them using black-box queries to the same LM, without consulting any external resources. Consistency checks done with "direct" queries about whether the generated reference title is real (inspired by Kadavath et al. 2022, Lin et al. 2022, Manakul et al. 2023) are compared to consistency checks with "indirect" queries which ask for ancillary details such as the authors of the work. These consistency checks are found to be partially reliable indicators of whether or not the reference is a hallucination. In particular, we find that LMs often hallucinate differing authors of hallucinated references when queried in independent sessions, while consistently identify authors of real references. This suggests that the hallucination may be more a generation issue than inherent to current training techniques or representation.

artificial intelligence, hallucinating reference, large language model, (1 more...)

arXiv.org Artificial Intelligence

Sep-13-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found