Reliable Decision Support with LLMs: A Framework for Evaluating Consistency in Binary Text Classification Applications

Megahed, Fadel M., Chen, Ying-Ju, Jones-Farmer, L. Allision, Lee, Younghwa, Wang, Jiawei Brooke, Zwetsloot, Inez M.

arXiv.org Machine Learning 

LLM-based annotation has become something of an academic Wild West: the lack of established practices and standards has led to concerns about the quality and validity of research. Researchers have warned that the ostensible simplicity of LLMs can be misleading, as they are prone to bias, misunderstandings, and unreliable results [1, p.1]. LLMs outperform typical human annotators. The evidence is consistent across different types of texts and time periods. It strongly suggests that ChatGPT may already be a superior approach compared to crowd annotations on platforms such as MTurk. At the very least, the findings demonstrate the importance of studying the text-annotation properties and capabilities of LLMs more in depth [2, p.2]. Together, these contrasting perspectives highlight the need to critically examine large language models (LLMs) for text annotation and classification. Although human annotation remains widespread, it poses considerable challenges. It is time-consuming and costly--up to $5 per annotation and $50 per hour for annotators [3]--and often suffers from inconsistencies stemming from the intricacies of language and the subjectivity of annotators [4].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found