Exploring the Plausibility of Hate and Counter Speech Detectors with Explainable AI

Böck, Adrian Jaques, Slijepčević, Djordje, Zeppelzauer, Matthias

Jul-25-2024–arXiv.org Artificial Intelligence

In this paper we investigate the explainability of transformer models and their plausibility for hate speech and counter speech detection. We compare representatives of four different explainability approaches, i.e., gradient-based, perturbation-based, attention-based, and prototype-based approaches, and analyze them quantitatively with an ablation study and qualitatively in a user study. Results show that perturbation-based explainability performs best, followed by gradient-based and attention-based explainability. Prototypebased experiments did not yield useful results. Overall, we observe that explainability strongly supports the users in better understanding the model predictions.

dataset, explanation, xai method, (14 more...)

arXiv.org Artificial Intelligence

Jul-25-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - Canada (0.04)
  - United States > New York
    - New York County > New York City (0.04)
- Europe
  - Austria > Vienna (0.04)
  - Portugal > Braga
    - Braga (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Iceland > Capital Region
    - Reykjavik (0.05)
  - France > Provence-Alpes-Côte d'Azur
    - Bouches-du-Rhône > Marseille (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - China > Hong Kong (0.04)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology
  - Communications > Social Media (1.00)
  - Artificial Intelligence
    - Representation & Reasoning (0.93)
    - Issues > Social & Ethical Issues (0.65)
    - Natural Language > Explanation & Argumentation (0.65)
    - Machine Learning > Neural Networks
      - Deep Learning (0.69)