CSEval: Towards Automated, Multi-Dimensional, and Reference-Free Counterspeech Evaluation using Auto-Calibrated LLMs

Hengle, Amey, Kumar, Aswini, Bandhakavi, Anil, Chakraborty, Tanmoy

Feb-9-2025–arXiv.org Artificial Intelligence

Counterspeech has emerged as a popular and effective strategy for combating online hate speech, sparking growing research interest in automating its generation using language models. However, the field still lacks standardised evaluation protocols and reliable automated evaluation metrics that align with human judgement. Current automatic evaluation methods, primarily based on similarity metrics, do not effectively capture the complex and independent attributes of counterspeech quality, such as contextual relevance, aggressiveness, or argumentative coherence. This has led to an increased dependency on labor-intensive human evaluations to assess automated counter-speech generation methods. To address these challenges, we introduce CSEval, a novel dataset and framework for evaluating counterspeech quality across four dimensions: contextual-relevance, aggressiveness, argument-coherence, and suitableness. Furthermore, we propose Auto-Calibrated COT for Counterspeech Evaluation (Auto-CSEval), a prompt-based method with auto-calibrated chain-of-thoughts (CoT) for scoring counterspeech using large language models. Our experiments show that Auto-CSEval outperforms traditional metrics like ROUGE, METEOR, and BertScore in correlating with human judgement, indicating a significant improvement in automated counterspeech evaluation.

counterspeech, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Feb-9-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - China > Hong Kong (0.04)
  - India > NCT
    - Delhi (0.04)
  - Japan > Kyūshū & Okinawa
    - Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)
  - Middle East > UAE
    - Abu Dhabi Emirate > Abu Dhabi (0.04)
  - Myanmar > Tanintharyi Region
    - Dawei (0.04)
- Europe
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Italy (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
- North America
  - Canada > Ontario
    - Toronto (0.04)
  - United States
    - California > Los Angeles County
      - Los Angeles (0.04)
    - District of Columbia > Washington (0.04)
    - Gulf of Mexico > Central GOM (0.04)
    - Michigan (0.04)
    - Pennsylvania (0.04)
    - Washington > King County
      - Seattle (0.04)

Genre:
- Research Report (1.00)

Industry:
- Law > Civil Rights & Constitutional Law (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.70)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found