Reliable Decision Support with LLMs: A Framework for Evaluating Consistency in Binary Text Classification Applications

Megahed, Fadel M., Chen, Ying-Ju, Jones-Farmer, L. Allision, Lee, Younghwa, Wang, Jiawei Brooke, Zwetsloot, Inez M.

May-22-2025–arXiv.org Machine Learning

LLM-based annotation has become something of an academic Wild West: the lack of established practices and standards has led to concerns about the quality and validity of research. Researchers have warned that the ostensible simplicity of LLMs can be misleading, as they are prone to bias, misunderstandings, and unreliable results [1, p.1]. LLMs outperform typical human annotators. The evidence is consistent across different types of texts and time periods. It strongly suggests that ChatGPT may already be a superior approach compared to crowd annotations on platforms such as MTurk. At the very least, the findings demonstrate the importance of studying the text-annotation properties and capabilities of LLMs more in depth [2, p.2]. Together, these contrasting perspectives highlight the need to critically examine large language models (LLMs) for text annotation and classification. Although human annotation remains widespread, it poses considerable challenges. It is time-consuming and costly--up to $5 per annotation and $50 per hour for annotators [3]--and often suffers from inconsistencies stemming from the intricacies of language and the subjectivity of annotators [4].

classification, large language model, machine learning, (19 more...)

arXiv.org Machine Learning

May-22-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Ohio
    - Montgomery County > Dayton (0.04)
    - Butler County > Oxford (0.04)
- Europe
  - United Kingdom > England
    - Greater London > London (0.04)
  - Netherlands > North Holland
    - Amsterdam (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Banking & Finance > Trading (1.00)
- Education (0.67)
- Information Technology (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning
    - Performance Analysis > Accuracy (1.00)
    - Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found