Your Model is Overconfident, and Other Lies We Tell Ourselves
Mickus, Timothee, Sinha, Aman, Vázquez, Raúl
–arXiv.org Artificial Intelligence
The difficulty intrinsic to a given example, rooted in its inherent ambiguity, is a key yet often overlooked factor in evaluating neural NLP models. We investigate the interplay and divergence among various metrics for assessing intrinsic difficulty, including annotator dissensus, training dynamics, and model confidence. Through a comprehensive analysis using 29 models on three datasets, we reveal that while correlations exist among these metrics, their relationships are neither linear nor monotonic. By disentangling these dimensions of uncertainty, we aim to refine our understanding of data complexity and its implications for evaluating and improving NLP models.
arXiv.org Artificial Intelligence
Mar-3-2025
- Country:
- Oceania > Australia
- North America
- Dominican Republic (0.04)
- United States
- Maryland > Baltimore (0.04)
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- Canada > Ontario
- Toronto (0.04)
- Europe
- Asia
- Singapore (0.04)
- British Indian Ocean Territory > Diego Garcia (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Middle East
- Jordan (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Saudi Arabia > Asir Province
- Abha (0.04)
- Genre:
- Research Report (1.00)
- Technology: