Auditing Counterfire: Evaluating Advanced Counterargument Generation with Evidence and Style
Verma, Preetika, Jaidka, Kokil, Churina, Svetlana
–arXiv.org Artificial Intelligence
We audited large language models (LLMs) for their ability to create evidence-based and stylistic counter-arguments to posts from the Reddit ChangeMyView dataset. We benchmarked their rhetorical quality across a host of qualitative and quantitative metrics and then ultimately evaluated them on their persuasive abilities as compared to human counter-arguments. Our evaluation is based on Counterfire: a new dataset of 32,000 counter-arguments generated from large language models (LLMs): GPT-3.5 Turbo and Koala and their fine-tuned variants, and PaLM 2, with varying prompts for evidence use and argumentative style. GPT-3.5 Turbo ranked highest in argument quality with strong paraphrasing and style adherence, particularly in `reciprocity' style arguments. However, the stylistic counter-arguments still fall short of human persuasive standards, where people also preferred reciprocal to evidence-based rebuttals. The findings suggest that a balance between evidentiality and stylistic elements is vital to a compelling counter-argument. We close with a discussion of future research directions and implications for evaluating LLM outputs.
arXiv.org Artificial Intelligence
Apr-19-2024
- Country:
- Africa > Niger (0.04)
- Asia
- India (0.04)
- Middle East > Jordan (0.04)
- Russia (0.04)
- Singapore (0.04)
- Europe
- Bulgaria > Sofia City Province
- Sofia (0.04)
- Germany > Hamburg (0.04)
- Russia (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Bulgaria > Sofia City Province
- North America
- Canada > Ontario
- Toronto (0.04)
- United States
- New York (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- Pennsylvania (0.04)
- Canada > Ontario
- Oceania > Australia
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Government (1.00)
- Health & Medicine > Therapeutic Area
- Immunology (0.93)
- Law (1.00)
- Media > News (1.00)
- Technology: