Same Same, But Different: Conditional Multi-Task Learning for Demographic-Specific Toxicity Detection
Gupta, Soumyajit, Lee, Sooyong, De-Arteaga, Maria, Lease, Matthew
–arXiv.org Artificial Intelligence
In developing natural language processing (NLP) models to detect toxic language (Arango et al., 2019; Schmidt and Wiegand, 2017; Vaidya et al., 2020), we typically assume that toxic language manifests in similar forms across different targeted groups. For example, HateCheck (Röttger et al., 2021) enumerates templatic patterns such as "I hate [GROUP]" that we expect detection models to handle robustly across groups. Moreover, we typically pool data across different demographic targets in model training in order to learn general patterns of linguistic toxicity across diverse demographic targets. However, the nature and form of toxic language used to target different demographic groups can vary quite markedly. Furthermore, an imbalanced distribution of different demographic groups in toxic language datasets risks over-fitting forms of toxic language most relevant to the majority group(s), potentially at the expense of systematically weaker model performance on minority group(s). For this reason, a "one-size-fits-all" modeling approach may yield sub-optimal performance and more specifically raise concerns of algorithmic fairness (Arango et al., 2019; Park et al., 2018; Sap et al., 2019). At the same time, radically siloing off datasets for each different demographic target group would prevent models from learning broader linguistic patterns of toxicity across different demographic groups targeted.
arXiv.org Artificial Intelligence
Mar-6-2023
- Country:
- North America > United States (0.93)
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Technology: