Hypothesis-only Biases in Large Language Model-Elicited Natural Language Inference
Proebsting, Grace, Poliak, Adam
–arXiv.org Artificial Intelligence
We test whether replacing crowdsource workers with LLMs to write Natural Language Inference (NLI) hypotheses similarly results in annotation artifacts. We recreate a portion of the Stanford NLI corpus using GPT-4, Llama-2 and Mistral 7b, and train hypothesis-only classifiers to determine whether LLM-elicited hypotheses contain annotation artifacts. On our LLM-elicited NLI datasets, BERT-based hypothesis-only classifiers achieve between 86-96% accuracy, indicating these datasets contain hypothesis-only artifacts. We also find frequent "give-aways" in LLM-generated hypotheses, e.g. the phrase "swimming in a pool" appears in more than 10,000 contradictions generated by GPT-4. Our analysis provides empirical evidence that well-attested biases in NLI can persist in LLM-generated data.
arXiv.org Artificial Intelligence
Oct-11-2024
- Country:
- Asia (0.68)
- Europe (1.00)
- North America > United States
- Minnesota (0.28)
- Genre:
- Research Report (0.82)
- Technology: