argument mining
- North America > United States > Texas (0.04)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- Oceania > Australia > New South Wales (0.04)
- (11 more...)
Winning with Less for Low Resource Languages: Advantage of Cross-Lingual English_Persian Argument Mining Model over LLM Augmentation
Jahan, Ali, Ghayoomi, Masood, Hautli-Janisz, Annette
Argument mining is a subfield of natural language processing to identify and extract the argument components, like premises and conclusions, within a text and to recognize the relations between them. It reveals the logical structure of texts to be used in tasks like knowledge extraction. This paper aims at utilizing a cross-lingual approach to argument mining for low-resource languages, by constructing three training scenarios. We examine the models on English, as a high-resource language, and Persian, as a low-resource language. To this end, we evaluate the models based on the English Microtext corpus \citep{PeldszusStede2015}, and its parallel Persian translation. The learning scenarios are as follow: (i) zero-shot transfer, where the model is trained solely with the English data, (ii) English-only training enhanced by synthetic examples generated by Large Language Models (LLMs), and (iii) a cross-lingual model that combines the original English data with manually translated Persian sentences. The zero-shot transfer model attains F1 scores of 50.2\% on the English test set and 50.7\% on the Persian test set. LLM-based augmentation model improves the performance up to 59.2\% on English and 69.3\% on Persian. The cross-lingual model, trained on both languages but evaluated solely on the Persian test set, surpasses the LLM-based variant, by achieving a F1 of 74.8\%. Results indicate that a lightweight cross-lingual blend can outperform considerably the more resource-intensive augmentation pipelines, and it offers a practical pathway for the argument mining task to overcome data resource shortage on low-resource languages.
- North America > United States (1.00)
- Europe (1.00)
- Asia > Middle East > Iran (0.14)
- Research Report > New Finding (0.68)
- Research Report > Experimental Study (0.67)
Large Language Models in Argument Mining: A Survey
Li, Hao, Schlegel, Viktor, Sun, Yizheng, Batista-Navarro, Riza, Nenadic, Goran
Large Language Models (LLMs) have fundamentally reshaped Argument Mining (AM), shifting it from a pipeline of supervised, task-specific classifiers to a spectrum of prompt-driven, retrieval-augmented, and reasoning-oriented paradigms. Yet existing surveys largely predate this transition, leaving unclear how LLMs alter task formulations, dataset design, evaluation methodology, and the theoretical foundations of computational argumentation. In this survey, we synthesise research and provide the first unified account of AM in the LLM era. We revisit canonical AM subtasks, i.e., claim and evidence detection, relation prediction, stance classification, argument quality assessment, and argumentative summarisation, and show how prompting, chain-of-thought reasoning, and in-context learning blur traditional task boundaries. We catalogue the rapid evolution of resources, including integrated multi-layer corpora and LLM-assisted annotation pipelines that introduce new opportunities as well as risks of bias and evaluation circularity. Building on this mapping, we identify emerging architectural patterns across LLM-based AM systems and consolidate evaluation practices spanning component-level accuracy, soft-label quality assessment, and LLM-judge reliability. Finally, we outline persistent challenges, including long-context reasoning, multimodal and multilingual robustness, interpretability, and cost-efficient deployment, and propose a forward-looking research agenda for LLM-driven computational argumentation.
IntelliProof: An Argumentation Network-based Conversational Helper for Organized Reflection
Miandoab, Kaveh Eskandari, Kowalyshyn, Katharine, Pamnani, Kabir, Gavhera, Anesu, Sarathy, Vasanth, Scheutz, Matthias
IntelliProof structures an essay as an argumentation graph, where claims are represented as nodes, supporting evidence is attached as node properties, and edges encode supporting or attacking relations. Unlike existing automated essay scoring systems, IntelliProof emphasizes the user experience: each relation is initially classified and scored by an LLM, then visualized for enhanced understanding. The system provides justifications for classifications and produces quantitative measures for essay coherence. It enables rapid exploration of argumentative quality while retaining human oversight. In addition, IntelliProof provides a set of tools for a better understanding of an argumentative essay and its corresponding graph in natural language, bridging the gap between the structural semantics of argumentative essays and the user's understanding of a given text.
- North America > United States > New Mexico (0.15)
- Europe > Austria > Vienna (0.15)
- Education > Assessment & Standards > Student Performance (0.57)
- Education > Educational Technology > Educational Software (0.56)
End-to-End Argument Mining through Autoregressive Argumentative Structure Prediction
Das, Nilmadhab, Vaibhav, Vishal, Choudhary, Yash Sunil, Saradhi, V. Vijaya, Anand, Ashish
Abstract--Argument Mining (AM) helps in automating the extraction of complex argumentative structures such as Argument Components (ACs) like Premise, Claim etc. and Argumentative Relations (ARs) like Support, Attack etc. in an argumentative text. Due to the inherent complexity of reasoning involved with this task, modelling dependencies between ACs and ARs is challenging. Most of the recent approaches formulate this task through a generative paradigm by flattening the argumentative structures. In contrast to that, this study jointly formulates the key tasks of AM in an end-to-end fashion using Autoregressive Argumentative Structure Prediction (AASP) framework. The proposed AASP framework is based on the autoregressive structure prediction framework that has given good performance for several NLP tasks. AASP framework models the argumentative structures as constrained pre-defined sets of actions with the help of a conditional pre-trained language model. These actions build the argumentative structures step-by-step in an autoregressive manner to capture the flow of argumentative reasoning in an efficient way. Extensive experiments conducted on three standard AM benchmarks demonstrate that AASP achieves state-of-the-art (SoT A) results across all AM tasks in two benchmarks and delivers strong results in one benchmark.
- North America > United States > Texas (0.04)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- Oceania > Australia > New South Wales (0.04)
- (11 more...)
The Argument is the Explanation: Structured Argumentation for Trust in Agents
Cakar, Ege, Kristensson, Per Ola
Humans are black boxes -- we cannot observe their neural processes, yet society functions by evaluating verifiable arguments. AI explainability should follow this principle: stakeholders need verifiable reasoning chains, not mechanistic transparency. We propose using structured argumentation to provide a level of explanation and verification neither interpretability nor LLM-generated explanation is able to offer. Our pipeline achieves state-of-the-art 94.44 macro F1 on the AAEC published train/test split (5.7 points above prior work) and $0.81$ macro F1, $\sim$0.07 above previous published results with comparable data setups, for Argumentative MicroTexts relation classification, converting LLM text into argument graphs and enabling verification at each inferential step. We demonstrate this idea on multi-agent risk assessment using the Structured What-If Technique, where specialized agents collaborate transparently to carry out risk assessment otherwise achieved by humans alone. Using Bipolar Assumption-Based Argumentation, we capture support/attack relationships, thereby enabling automatic hallucination detection via fact nodes attacking arguments. We also provide a verification mechanism that enables iterative refinement through test-time feedback without retraining. For easy deployment, we provide a Docker container for the fine-tuned AMT model, and the rest of the code with the Bipolar ABA Python package on GitHub.
- North America > United States > Mississippi > Lafayette County > Oxford (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- (5 more...)
- Information Technology > Security & Privacy (0.71)
- Government (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
AMELIA: A Family of Multi-task End-to-end Language Models for Argumentation
Argument mining is a subfield of argumentation that aims to automatically extract argumentative structures and their relations from natural language texts. This paper investigates how a single large language model can be leveraged to perform one or several argument mining tasks. Our contributions are two-fold. First, we construct a multi-task dataset by surveying and converting 19 well-known argument mining datasets from the literature into a unified format. Second, we explore various training strategies using Meta AI's Llama-3.1-8B-Instruct model: (1) fine-tuning on individual tasks, (2) fine-tuning jointly on multiple tasks, and (3) merging models fine-tuned separately on individual tasks. Our experiments show that task-specific fine-tuning significantly improves individual performance across all tasks. Moreover, multi-task fine-tuning maintains strong performance without degradation, suggesting effective transfer learning across related tasks. Finally, we demonstrate that model merging offers a viable compromise: it yields competitive performance while mitigating the computational costs associated with full multi-task fine-tuning.
- Europe (1.00)
- North America > United States (0.46)
- Government (0.93)
- Law (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
A comprehensive study of LLM-based argument classification: from LLAMA through GPT-4o to Deepseek-R1
Pietroń, Marcin, Olszowski, Rafał, Gomułka, Jakub, Gampel, Filip, Tomski, Andrzej
Argument mining (AM) is an interdisciplinary research field that integrates insights from logic, philosophy, linguistics, rhetoric, law, psychology, and computer science. It involves the automatic identification and extraction of argumentative components, such as premises and claims, and the detection of relationships between them, such as support, attack, or neutrality. Recently, the field has advanced significantly, especially with the advent of large language models (LLMs), which have enhanced the efficiency of analyzing and extracting argument semantics compared to traditional methods and other deep learning models. There are many benchmarks for testing and verifying the quality of LLM, but there is still a lack of research and results on the operation of these models in publicly available argument classification databases. This paper presents a study of a selection of LLM's, using diverse datasets such as Args.me and UKP. The models tested include versions of GPT, Llama, and DeepSeek, along with reasoning-enhanced variants incorporating the Chain-of-Thoughts algorithm. The results indicate that ChatGPT-4o outperforms the others in the argument classification benchmarks. In case of models incorporated with reasoning capabilities, the Deepseek-R1 shows its superiority. However, despite their superiority, GPT-4o and Deepseek-R1 still make errors. The most common errors are discussed for all models. To our knowledge, the presented work is the first broader analysis of the mentioned datasets using LLM and prompt algorithms. The work also shows some weaknesses of known prompt algorithms in argument analysis, while indicating directions for their improvement. The added value of the work is the in-depth analysis of the available argument datasets and the demonstration of their shortcomings.
- Europe (0.28)
- North America > United States (0.28)
- Asia > Middle East (0.28)
- Law (1.00)
- Government (1.00)
- Energy (1.00)
- (3 more...)
Medical Argument Mining: Exploitation of Scarce Data Using NLI Systems
Urruela, Maitane, Martín, Sergio, De la Iglesia, Iker, Barrena, Ander
In recent years, there has been a growing interest in developing intelligent systems to assist healthcare professionals, particularly in the field of Evidence-Based Medicine (EBM). EBM systems aim to extract pertinent information from unstructured clinical documents and transform it into a structured, machine-readable format, enabling automated analysis. Argument Mining (AM), aligning with EBM, examines the evidence and reasoning clinicians use in clinical cases. This process involves identifying argumentative structures within texts--specifically, finding claims (a point to be proved) and premises (evidence that supports or refutes a claim), and establishing support or attack relations between them. In the clinical context, this process enables the extraction of logical relationships that justify clinical decision-making (Stylianou and Vlahavas, 2021).
- Europe > Spain > Basque Country (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Italy > Lazio > Rome (0.04)