Topic Bias in Emotion Classification
Wegge, Maximilian, Klinger, Roman
–arXiv.org Artificial Intelligence
Emotion corpora are typically sampled based on keyword/hashtag search or by asking study participants to generate textual instances. In any case, these corpora are not uniform samples representing the entirety of a domain. We hypothesize that this practice of data acquisition leads to unrealistic correlations between overrepresented topics in these corpora that harm the generalizability of models. Such topic bias could lead to wrong predictions for instances like "I organized the service for my aunt's funeral." when funeral events are over-represented for instances labeled with sadness, despite the emotion of pride being more appropriate here. In this paper, we study this topic bias both from the data and the modeling perspective. We first label a set of emotion corpora automatically via topic modeling and show that emotions in fact correlate with specific topics. Further, we see that emotion classifiers are confounded by such topics. Finally, we show that the established debiasing method of adversarial correction via gradient reversal mitigates the issue. Our work points out issues with existing emotion corpora and that more representative resources are required for fair evaluation of models predicting affective concepts from text.
arXiv.org Artificial Intelligence
Feb-2-2024
- Country:
- North America
- Dominican Republic (0.04)
- United States
- Maryland > Baltimore (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- New Mexico > Santa Fe County
- Santa Fe (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- California > San Diego County
- San Diego (0.04)
- Canada > British Columbia
- Europe
- Ireland (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Italy > Tuscany
- Florence (0.04)
- Germany > Baden-Württemberg
- Stuttgart Region > Stuttgart (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Asia
- China > Hong Kong (0.04)
- Taiwan > Taiwan Province
- Taipei (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Health & Medicine > Therapeutic Area (0.46)
- Technology: