LLM Confidence Evaluation Measures in Zero-Shot CSS Classification

Farr, David, Cruickshank, Iain, Manzonelli, Nico, Clark, Nicholas, Starbird, Kate, West, Jevin

Nov-1-2024–arXiv.org Artificial Intelligence

Assessing classification confidence is critical for leveraging large language models (LLMs) in automated labeling tasks, especially in the sensitive domains presented by Computational Social Science (CSS) tasks. In this paper, we make three key contributions: (1) we propose an uncertainty quantification (UQ) performance measure tailored for data annotation tasks, (2) we compare, for the first time, five different UQ strategies across three distinct LLMs and CSS data annotation tasks, (3) we introduce a novel UQ aggregation strategy that effectively identifies low-confidence LLM annotations and disproportionately uncovers data incorrectly labeled by the LLMs. Our results demonstrate that our proposed UQ aggregation strategy improves upon existing methods andcan be used to significantly improve human-in-the-loop data annotation processes.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Nov-1-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - Mexico (0.05)
  - United States
    - New York > New York County
      - New York City (0.04)
    - California > San Diego County
      - San Diego (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report > New Finding (0.55)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.49)