ContraDoc: Understanding Self-Contradictions in Documents with Large Language Models
Li, Jierui, Raheja, Vipul, Kumar, Dhruv
–arXiv.org Artificial Intelligence
In recent times, large language models (LLMs) have shown impressive performance on various document-level tasks such as document classification, summarization, and question-answering. However, research on understanding their capabilities on the task of self-contradictions in long documents has been very limited. In this work, we introduce ContraDoc, the first human-annotated dataset to study self-contradictions in long documents across multiple domains, varying document lengths, self-contradictions types, and scope. We then analyze the current capabilities of four state-of-the-art open-source and commercially available LLMs: GPT3.5, GPT4, PaLM2, and LLaMAv2 on this dataset. While GPT4 performs the best and can outperform humans on this task, we find that it is still unreliable and struggles with self-contradictions that require more nuance and context. We release the dataset and all the code associated with the experiments.
arXiv.org Artificial Intelligence
Nov-15-2023
- Country:
- Asia (0.68)
- Europe (0.67)
- North America > United States
- Minnesota > Hennepin County > Minneapolis (0.14)
- Genre:
- Research Report
- Experimental Study (0.68)
- New Finding (0.46)
- Research Report
- Industry:
- Health & Medicine (0.68)
- Technology: