The Lou Dataset -- Exploring the Impact of Gender-Fair Language in German Text Classification

Waldis, Andreas, Birrer, Joel, Lauscher, Anne, Gurevych, Iryna

arXiv.org Artificial Intelligence 

Nevertheless, there is a significant lack of resources to assess the impact of this linguistic shift on classification using language models (LMs), which are probably not trained on such variations. To address this gap, we present Lou, the first dataset featuring high-quality reformulations for German text classification covering seven tasks, like stance detection and toxicity classification. Evaluating 16 mono-and multi-lingual LMs on Lou shows that genderfair language substantially impacts predictions by flipping labels, reducing certainty, and altering attention patterns. However, existing evaluations remain valid, as LM rankings of Figure 1: A German stance detection instance from the original and reformulated instances do not significantly Lou dataset. We reformulate the masculine formulation differ. While we offer initial insights Konsumenten (consumers) regarding six inclusive or on the effect on German text classification, the neutral strategies, highlighted in yellow. Translation: findings likely apply to other languages, as consistent Consumers must be well supported.