Goto

Collaborating Authors

 implicit hate speech


Tox-BART: Leveraging Toxicity Attributes for Explanation Generation of Implicit Hate Speech

arXiv.org Artificial Intelligence

Employing language models to generate explanations for an incoming implicit hate post is an active area of research. The explanation is intended to make explicit the underlying stereotype and aid content moderators. The training often combines top-k relevant knowledge graph (KG) tuples to provide world knowledge and improve performance on standard metrics. Interestingly, our study presents conflicting evidence for the role of the quality of KG tuples in generating implicit explanations. Consequently, simpler models incorporating external toxicity signals outperform KG-infused models. Compared to the KG-based setup, we observe a comparable performance for SBIC (LatentHatred) datasets with a performance variation of +0.44 (+0.49), +1.83 (-1.56), and -4.59 (+0.77) in BLEU, ROUGE-L, and BERTScore. Further human evaluation and error analysis reveal that our proposed setup produces more precise explanations than zero-shot GPT-3.5, highlighting the intricate nature of the task.


Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech

arXiv.org Artificial Intelligence

Recent studies have alarmed that many online hate speeches are implicit. With its subtle nature, the explainability of the detection of such hateful speech has been a challenging problem. In this work, we examine whether ChatGPT can be used for providing natural language explanations (NLEs) for implicit hateful speech detection. We design our prompt to elicit concise ChatGPT-generated NLEs and conduct user studies to evaluate their qualities by comparison with human-written NLEs. We discuss the potential and limitations of ChatGPT in the context of implicit hateful speech research.


Chain of Explanation: New Prompting Method to Generate Higher Quality Natural Language Explanation for Implicit Hate Speech

arXiv.org Artificial Intelligence

The potential of sequence-to-sequence (Seq2Seq) models and prompting Recent studies have exploited advanced generative language models methods has not been fully explored [4]. Moreover, traditional evaluation to generate Natural Language Explanations (NLE) for why a certain metrics, such as BLEU [20] and Rouge [18], applied in NLE text could be hateful. We propose the Chain of Explanation (CoE) generation for hate speech, may also not be able to comprehensively Prompting method, using the heuristic words and target group, to capture the quality of the generated explanations because they generate high-quality NLE for implicit hate speech. We improved heavily rely on the word-level overlaps [3]. To fill those gaps, we the BLUE score from 44.0 to 62.3 for NLE generation by providing propose a Chain of Explanations (CoE) prompt method to generate accurate target information. We then evaluate the quality of generated high-quality NLE distinguishing the implicit hate speech from nonhateful NLE using various automatic metrics and human annotations tweets.