The Lou Dataset -- Exploring the Impact of Gender-Fair Language in German Text Classification

Waldis, Andreas, Birrer, Joel, Lauscher, Anne, Gurevych, Iryna

Sep-26-2024–arXiv.org Artificial Intelligence

Nevertheless, there is a significant lack of resources to assess the impact of this linguistic shift on classification using language models (LMs), which are probably not trained on such variations. To address this gap, we present Lou, the first dataset featuring high-quality reformulations for German text classification covering seven tasks, like stance detection and toxicity classification. Evaluating 16 mono-and multi-lingual LMs on Lou shows that genderfair language substantially impacts predictions by flipping labels, reducing certainty, and altering attention patterns. However, existing evaluations remain valid, as LM rankings of Figure 1: A German stance detection instance from the original and reformulated instances do not significantly Lou dataset. We reformulate the masculine formulation differ. While we offer initial insights Konsumenten (consumers) regarding six inclusive or on the effect on German text classification, the neutral strategies, highlighted in yellow. Translation: findings likely apply to other languages, as consistent Consumers must be well supported.

computational linguistic, gender-fair language, reformulation, (12 more...)

arXiv.org Artificial Intelligence

Sep-26-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Washington > King County
      - Seattle (0.14)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
  - Mexico > Mexico City
    - Mexico City (0.04)
  - Canada
    - Ontario > Toronto (0.04)
    - Quebec > Montreal (0.04)
- Europe
  - Switzerland > Zürich
    - Zürich (0.14)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Middle East > Malta
    - Eastern Region > Northern Harbour District > St. Julian's (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Germany
    - North Rhine-Westphalia > Düsseldorf Region
      - Düsseldorf (0.04)
    - Hesse > Darmstadt Region
      - Darmstadt (0.04)
  - Finland > Pirkanmaa
    - Tampere (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - Singapore (0.04)
  - China > Hong Kong (0.04)
  - Myanmar > Tanintharyi Region
    - Dawei (0.04)
  - Middle East
    - Israel (0.04)
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.04)
- Africa > Rwanda
  - Kigali > Kigali (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Government > Regional Government > Europe Government (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Text Classification (0.90)
  - Machine Learning > Neural Networks
    - Deep Learning (0.47)