Open-Source Large Language Models Outperform Crowd Workers and Approach ChatGPT in Text-Annotation Tasks

Alizadeh, Meysam, Kubli, Maël, Samei, Zeynab, Dehghani, Shirin, Bermeo, Juan Diego, Korobeynikova, Maria, Gilardi, Fabrizio

arXiv.org Artificial Intelligence 

For instance, studies demonstrate that ChatGPT exceeds the performance of crowd-workers in tasks encompassing relevance, stance, sentiment, topic identification, and frame detection (Gilardi, Alizadeh and Kubli, 2023), that it outperforms trained annotators in detecting the political party affiliations of Twitter users (Törnberg, 2023), and that it achieves accuracy scores over 0.6 for tasks such as stance, sentiment, hate speech detection, and bot identification (Zhu et al., 2023). Notably, ChatGPT also demonstrates the ability to correctly classify more than 70% of news as either true or false (Hoes, Altay and Bermeo, 2023), which suggests that LLMs might potentially be used to assist content moderation processes. While the performance of LLMs for text annotation is promising, there are several aspects that remain unclear and require further research. Among these is the impact of different approaches such as zero-shot versus few-shot learning and settings such as varying temperature parameters. Zero-shot learning allows models to predict for unseen tasks, while few-shot learning uses a small number of examples to generalize to new tasks. The conditions under which one approach outperforms the other are not fully understood yet.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found