Same Same, But Different: Conditional Multi-Task Learning for Demographic-Specific Toxicity Detection

Gupta, Soumyajit, Lee, Sooyong, De-Arteaga, Maria, Lease, Matthew

Mar-6-2023–arXiv.org Artificial Intelligence

In developing natural language processing (NLP) models to detect toxic language (Arango et al., 2019; Schmidt and Wiegand, 2017; Vaidya et al., 2020), we typically assume that toxic language manifests in similar forms across different targeted groups. For example, HateCheck (Röttger et al., 2021) enumerates templatic patterns such as "I hate [GROUP]" that we expect detection models to handle robustly across groups. Moreover, we typically pool data across different demographic targets in model training in order to learn general patterns of linguistic toxicity across diverse demographic targets. However, the nature and form of toxic language used to target different demographic groups can vary quite markedly. Furthermore, an imbalanced distribution of different demographic groups in toxic language datasets risks over-fitting forms of toxic language most relevant to the majority group(s), potentially at the expense of systematically weaker model performance on minority group(s). For this reason, a "one-size-fits-all" modeling approach may yield sub-optimal performance and more specifically raise concerns of algorithmic fairness (Arango et al., 2019; Park et al., 2018; Sap et al., 2019). At the same time, radically siloing off datasets for each different demographic target group would prevent models from learning broader linguistic patterns of toxicity across different demographic groups targeted.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Mar-6-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.93)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Law Enforcement & Public Safety (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks (1.00)
    - Performance Analysis > Accuracy (1.00)
  - Natural Language (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found