Different Tastes of Entities: Investigating Human Label Variation in Named Entity Annotations
Peng, Siyao, Sun, Zihang, Loftus, Sebastian, Plank, Barbara
–arXiv.org Artificial Intelligence
Named Entity Recognition (NER) is a key information extraction task with a long-standing tradition. While recent studies address and aim to correct annotation errors via re-labeling efforts, little is known about the sources of human label variation, such as text ambiguity, annotation error, or guideline divergence. This is especially the case for high-quality datasets and beyond English CoNLL03. This paper studies disagreements in expert-annotated named entity datasets for three languages: English, Danish, and Bavarian. We show that text ambiguity and artificial guideline changes are dominant factors for diverse annotations among high-quality revisions. We survey student annotations on a subset of difficult entities and substantiate the feasibility and necessity of manifold annotations for understanding named entity ambiguities from a distributional perspective.
arXiv.org Artificial Intelligence
Feb-2-2024
- Country:
- Africa > South Africa
- Western Cape > Cape Town (0.04)
- Asia
- China > Hong Kong (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Singapore (0.04)
- Europe
- Ireland (0.04)
- Sweden > Östergötland County
- Linköping (0.04)
- Denmark > Southern Denmark
- Vejle (0.04)
- Italy (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Finland > Southwest Finland
- Turku (0.04)
- Spain (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Germany > Bavaria
- Upper Bavaria > Munich (0.05)
- North America
- Canada > Ontario
- Toronto (0.05)
- Costa Rica (0.04)
- Dominican Republic (0.04)
- United States
- Georgia > Fulton County
- Atlanta (0.04)
- Maryland > Baltimore (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.04)
- New Mexico > Santa Fe County
- Santa Fe (0.04)
- New York (0.05)
- Georgia > Fulton County
- Canada > Ontario
- Oceania > Australia (0.04)
- Africa > South Africa
- Genre:
- Research Report (1.00)
- Industry:
- Leisure & Entertainment > Sports (0.94)
- Technology: