Soft-Label Integration for Robust Toxicity Classification

Cheng, Zelei, Wu, Xian, Yu, Jiahao, Han, Shuo, Cai, Xin-Qiang, Xing, Xinyu

Nov-7-2024–arXiv.org Artificial Intelligence

Toxicity classification in textual content remains a significant problem. Data with labels from a single annotator fall short of capturing the diversity of human perspectives. Therefore, there is a growing need to incorporate crowdsourced annotations for training an effective toxicity classifier. Additionally, the standard approach to training a classifier using empirical risk minimization (ERM) may fail to address the potential shifts between the training set and testing set due to exploiting spurious correlations. This work introduces a novel bi-level optimization framework that integrates crowdsourced annotations with the soft-labeling technique and optimizes the soft-label weights by Group Distributionally Robust Optimization (GroupDRO) to enhance the robustness against out-of-distribution (OOD) risk. We theoretically prove the convergence of our bi-level optimization algorithm. Experimental results demonstrate that our approach outperforms existing baseline methods in terms of both average and worst-group accuracy, confirming its effectiveness in leveraging crowdsourced annotations to achieve more effective and robust toxicity classification.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

Nov-7-2024

arXiv.org PDF

Add feedback

Country:
- Asia (0.28)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (1.00)

Industry:
- Government (0.67)
- Information Technology > Security & Privacy (1.00)
- Law (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks
      - Deep Learning (0.94)
    - Natural Language
      - Chatbot (0.68)
      - Large Language Model (0.71)
    - Representation & Reasoning > Optimization (1.00)
  - Communications > Social Media
    - Crowdsourcing (0.88)