Biases in Large Language Model-Elicited Text: A Case Study in Natural Language Inference

Proebsting, Grace, Poliak, Adam

arXiv.org Artificial Intelligence 

On our LLMgenerated Creating NLP datasets with Large Language Models NLI datasets, fine-tuned BERT classifiers (LLMs) is an attractive alternative to relying on achieve 86-96% accuracy when given only crowd-source workers (Ziems et al., 2024). Compared the hypotheses, compared to 72% performance on to crowd-source workers, LLMs are inexpensive, SNLI. We also find the LLM-generated datasets fast, and always available. Although LLMs contain similar gender stereotypes as SNLI. Our research require validation (Pangakis et al., 2023), they are suggests that while eliciting text from LLMs an efficient tool to annotate data (Zhao et al., 2022; to generate NLP datasets is enticing and promising, Bansal and Sharma, 2023; Gilardi et al., 2023; He thorough quality control is necessary.