Intent-conditioned and Non-toxic Counterspeech Generation using Multi-Task Instruction Tuning with RLAIF

Hengle, Amey, Kumar, Aswini, Singh, Sahajpreet, Bandhakavi, Anil, Akhtar, Md Shad, Chakroborty, Tanmoy

Mar-15-2024–arXiv.org Artificial Intelligence

Counterspeech, defined as a response to mitigate online hate speech, is increasingly used as a non-censorial solution. Addressing hate speech effectively involves dispelling the stereotypes, prejudices, and biases often subtly implied in brief, single-sentence statements or abuses. These implicit expressions challenge language models, especially in seq2seq tasks, as model performance typically excels with longer contexts. Our study introduces CoARL, a novel framework enhancing counterspeech generation by modeling the pragmatic implications underlying social biases in hateful statements. CoARL's first two phases involve sequential multi-instruction tuning, teaching the model to understand intents, reactions, and harms of offensive statements, and then learning task-specific low-rank adapter weights for generating intent-conditioned counterspeech. The final phase uses reinforcement learning to fine-tune outputs for effectiveness and non-toxicity. CoARL outperforms existing benchmarks in intent-conditioned counterspeech generation, showing an average improvement of 3 points in intent-conformity and 4 points in argument-quality metrics. Extensive human evaluation supports CoARL's efficacy in generating superior and more context-appropriate responses compared to existing systems, including prominent LLMs like ChatGPT.

counterspeech, counterspeech generation, speech, (13 more...)

arXiv.org Artificial Intelligence

Mar-15-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - United States > Michigan (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Europe > Spain
  - Catalonia > Barcelona Province > Barcelona (0.04)
- Asia
  - Middle East > UAE (0.04)
  - China > Hong Kong (0.04)
  - Japan > Kyūshū & Okinawa
    - Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)
  - India > NCT
    - Delhi (0.04)

Genre:
- Research Report (1.00)

Industry:
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found