counter-argument
Utilising Large Language Models for Generating Effective Counter Arguments to Anti-Vaccine Tweets
Dhanuka, Utsav, Poddar, Soham, Ghosh, Saptarshi
In an era where public health is increasingly influenced by information shared on social media, combatting vaccine skepticism and misinformation has become a critical societal goal. Misleading narratives around vaccination have spread widely, creating barriers to achieving high immunisation rates and undermining trust in health recommendations. While efforts to detect misinformation have made significant progress, the generation of real time counter-arguments tailored to debunk such claims remains an insufficiently explored area. In this work, we explore the capabilities of LLMs to generate sound counter-argument rebuttals to vaccine misinformation. Building on prior research in misinformation debunking, we experiment with various prompting strategies and fine-tuning approaches to optimise counter-argument generation. Additionally, we train classifiers to categorise anti-vaccine tweets into multi-labeled categories such as concerns about vaccine efficacy, side effects, and political influences allowing for more context aware rebuttals. Our evaluation, conducted through human judgment, LLM based assessments, and automatic metrics, reveals strong alignment across these methods. Our findings demonstrate that integrating label descriptions and structured fine-tuning enhances counter-argument effectiveness, offering a promising approach for mitigating vaccine misinformation at scale.
- North America > United States (0.93)
- Europe > United Kingdom (0.14)
- Asia > India > West Bengal > Kharagpur (0.04)
- Health & Medicine > Therapeutic Area > Vaccines (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Government > Regional Government > North America Government > United States Government (0.46)
DS@GT at Touché: Large Language Models for Retrieval-Augmented Debate
Miyaguchi, Anthony, Johnston, Conor, Potdar, Aaryan
Large Language Models (LLMs) demonstrate strong conversational abilities. In this Working Paper, we study them in the context of debating in two ways: their ability to perform in a structured debate along with a dataset of arguments to use and their ability to evaluate utterances throughout the debate. We deploy six leading publicly available models from three providers for the Retrieval-Augmented Debate and Evaluation. The evaluation is performed by measuring four key metrics: Quality, Quantity, Manner, and Relation. Throughout this task, we found that although LLMs perform well in debates when given related arguments, they tend to be verbose in responses yet consistent in evaluation. The accompanying source code for this paper is located at https://github.com/dsgt-arc/touche-2025-rad.
- North America > United States > New York (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- Europe > Spain > Galicia > Madrid (0.04)
Could you be wrong: Debiasing LLMs using a metacognitive prompt for improving human decision making
Identifying bias in LLMs is ongoing. Because they are still in development, what is true today may be false tomorrow. We therefore need general strategies for debiasing that will outlive current models. Strategies developed for debiasing human decision making offer one promising approach as they incorporate an LLM-style prompt intervention designed to bring latent knowledge into awareness during decision making. LLMs trained on vast amounts of information contain information about potential biases, counter-arguments, and contradictory evidence, but that information may only be brought to bear if prompted. Metacognitive prompts developed in the human decision making literature are designed to achieve this, and as I demonstrate here, they show promise with LLMs. The prompt I focus on here is "could you be wrong?" Following an LLM response, this prompt leads LLMs to produce additional information, including why they answered as they did, errors, biases, contradictory evidence, and alternatives, none of which were apparent in their initial response. Indeed, this metaknowledge often reveals that how LLMs and users interpret prompts are not aligned. Here I demonstrate this prompt using a set of questions taken from recent articles about LLM biases, including implicit discriminatory biases and failures of metacognition. "Could you be wrong" prompts the LLM to identify its own biases and produce cogent metacognitive reflection. I also present another example involving convincing but incomplete information, which is readily corrected by the metacognitive prompt. In sum, this work argues that human psychology offers a new avenue for prompt engineering, leveraging a long history of effective prompt-based improvements to human decision making.
- Europe > United Kingdom > England > West Midlands > Coventry (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Dynamic Knowledge Integration for Evidence-Driven Counter-Argument Generation with Large Language Models
Yeginbergen, Anar, Oronoz, Maite, Agerri, Rodrigo
This paper investigates the role of dynamic external knowledge integration in improving counter-argument generation using Large Language Models (LLMs). While LLMs have shown promise in argumentative tasks, their tendency to generate lengthy, potentially unfactual responses highlights the need for more controlled and evidence-based approaches. We introduce a new manually curated dataset of argument and counter-argument pairs specifically designed to balance argumentative complexity with evaluative feasibility. We also propose a new LLM-as-a-Judge evaluation methodology that shows a stronger correlation with human judgments compared to traditional reference-based metrics. Our experimental results demonstrate that integrating dynamic external knowledge from the web significantly improves the quality of generated counter-arguments, particularly in terms of relatedness, persuasiveness, and factuality. The findings suggest that combining LLMs with real-time external knowledge retrieval offers a promising direction for developing more effective and reliable counter-argumentation systems.
- North America > Canada (0.28)
- Europe > Spain (0.28)
- Oceania > Australia (0.14)
- (3 more...)
Overview of AI-Debater 2023: The Challenges of Argument Generation Tasks
Lin, Jiayu, Chen, Guanrong, Jin, Bojun, Li, Chenyang, Jia, Shutong, Lin, Wancong, Sun, Yang, He, Yuhang, Yang, Caihua, Bao, Jianzhu, Wu, Jipeng, Su, Wen, Chen, Jinglu, Li, Xinyi, Chen, Tianyu, Han, Mingjie, Du, Shuaiwen, Wang, Zijian, Li, Jiyin, Suo, Fuzhong, Wang, Hao, Lin, Nuanchen, Huang, Xuanjing, Jiang, Changjian, Xu, RuiFeng, Zhang, Long, Cao, Jiuxin, Jin, Ting, Wei, Zhongyu
Argument and debate are fundamental capabilities of human intelligence, essential for a wide range of human activities, and common to all human societies. Argumentation [1, 2, 3] takes the human logical argumentation process as the research object, and is a research field involving logic, philosophy, language, rhetoric, computer science and education. Striving to enable models to automatically understand and generate argument texts, computational argumentation, a newly emerging research field, is obtaining increasing attention from the research community [4]. Depending on the task objectives, computational argumentation tasks can be divided into two aspects, argument mining and argument generation. With the rapid development of modern technology, online forums like ChangeMyView allow people to freely exchange opinions on specific topics, making them suitable data sources for argument generation tasks, especially for designing artificial debaters, as online forums closely resemble real-world debates. Initial research in this field has focused on analyzing ChangeMyView data [5, 6] to summarize the key factors of persuasive arguments.
ArguMentor: Augmenting User Experiences with Counter-Perspectives
Opinion pieces (or op-eds) can provide valuable perspectives, but they often represent only one side of a story, which can make readers susceptible to confirmation bias and echo chambers. Exposure to different perspectives can help readers overcome these obstacles and form more robust, nuanced views on important societal issues. We designed ArguMentor, a human-AI collaboration system that highlights claims in opinion pieces, identifies counter-arguments for them using a LLM, and generates a context-based summary of based on current events. It further enhances user understanding through additional features like a Q&A bot (that answers user questions pertaining to the text), DebateMe (an agent that users can argue any side of the piece with) and highlighting (where users can highlight a word or passage to get its definition or context). Our evaluation shows that participants can generate more arguments and counter-arguments and have, on average, have more moderate views after engaging with the system.
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > Virginia (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
- Health & Medicine (1.00)
- Media > News (0.93)
- Government (0.68)
Auditing Counterfire: Evaluating Advanced Counterargument Generation with Evidence and Style
Verma, Preetika, Jaidka, Kokil, Churina, Svetlana
We audited large language models (LLMs) for their ability to create evidence-based and stylistic counter-arguments to posts from the Reddit ChangeMyView dataset. We benchmarked their rhetorical quality across a host of qualitative and quantitative metrics and then ultimately evaluated them on their persuasive abilities as compared to human counter-arguments. Our evaluation is based on Counterfire: a new dataset of 32,000 counter-arguments generated from large language models (LLMs): GPT-3.5 Turbo and Koala and their fine-tuned variants, and PaLM 2, with varying prompts for evidence use and argumentative style. GPT-3.5 Turbo ranked highest in argument quality with strong paraphrasing and style adherence, particularly in `reciprocity' style arguments. However, the stylistic counter-arguments still fall short of human persuasive standards, where people also preferred reciprocal to evidence-based rebuttals. The findings suggest that a balance between evidentiality and stylistic elements is vital to a compelling counter-argument. We close with a discussion of future research directions and implications for evaluating LLM outputs.
- Asia > Singapore (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Pennsylvania (0.04)
- (11 more...)
- Media > News (1.00)
- Law (1.00)
- Government (1.00)
- Health & Medicine > Therapeutic Area > Immunology (0.93)
Argue with Me Tersely: Towards Sentence-Level Counter-Argument Generation
Lin, Jiayu, Ye, Rong, Han, Meng, Zhang, Qi, Lai, Ruofei, Zhang, Xinyu, Cao, Zhao, Huang, Xuanjing, Wei, Zhongyu
Counter-argument generation -- a captivating area in computational linguistics -- seeks to craft statements that offer opposing views. While most research has ventured into paragraph-level generation, sentence-level counter-argument generation beckons with its unique constraints and brevity-focused challenges. Furthermore, the diverse nature of counter-arguments poses challenges for evaluating model performance solely based on n-gram-based metrics. In this paper, we present the ArgTersely benchmark for sentence-level counter-argument generation, drawing from a manually annotated dataset from the ChangeMyView debate forum. We also propose Arg-LlaMA for generating high-quality counter-argument. For better evaluation, we trained a BERT-based evaluator Arg-Judge with human preference data. We conducted comparative experiments involving various baselines such as LlaMA, Alpaca, GPT-3, and others. The results show the competitiveness of our proposed framework and evaluator in counter-argument generation tasks. Code and data are available at https://github.com/amazingljy1206/ArgTersely.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (9 more...)
- Law (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)
Contextualizing Argument Quality Assessment with Relevant Knowledge
Deshpande, Darshan, Sourati, Zhivar, Ilievski, Filip, Morstatter, Fred
Automatic assessment of the quality of arguments has been recognized as a challenging task with significant implications for misinformation and targeted speech. While real-world arguments are tightly anchored in context, existing computational methods analyze their quality in isolation, which affects their accuracy and generalizability. We propose SPARK: a novel method for scoring argument quality based on contextualization via relevant knowledge. We devise four augmentations that leverage large language models to provide feedback, infer hidden assumptions, supply a similar-quality argument, or give a counter-argument. SPARK uses a dual-encoder Transformer architecture to enable the original argument and its augmentation to be considered jointly. Our experiments in both in-domain and zero-shot setups show that SPARK consistently outperforms existing techniques across multiple metrics.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California (0.14)
- Europe > Germany > Berlin (0.04)
- (13 more...)
Conclusion-based Counter-Argument Generation
Alshomary, Milad, Wachsmuth, Henning
In real-world debates, the most common way to counter an argument is to reason against its main point, that is, its conclusion. Existing work on the automatic generation of natural language counter-arguments does not address the relation to the conclusion, possibly because many arguments leave their conclusion implicit. In this paper, we hypothesize that the key to effective counter-argument generation is to explicitly model the argument's conclusion and to ensure that the stance of the generated counter is opposite to that conclusion. In particular, we propose a multitask approach that jointly learns to generate both the conclusion and the counter of an input argument. The approach employs a stance-based ranking component that selects the counter from a diverse set of generated candidates whose stance best opposes the generated conclusion. In both automatic and manual evaluation, we provide evidence that our approach generates more relevant and stance-adhering counters than strong baselines.
- Oceania > Australia (0.04)
- North America > United States > Colorado > Denver County > Denver (0.04)
- North America > United States > California (0.04)
- North America > Dominican Republic (0.04)