logical fallacy
Socrates or Smartypants: Testing Logic Reasoning Capabilities of Large Language Models with Logic Programming-based Test Oracles
Xu, Zihao, Ding, Junchen, Lou, Yiling, Zhang, Kun, Gong, Dong, Li, Yuekang
Large Language Models (LLMs) have achieved significant progress in language understanding and reasoning. Evaluating and analyzing their logical reasoning abilities has therefore become essential. However, existing datasets and benchmarks are often limited to overly simplistic, unnatural, or contextually constrained examples. In response to the growing demand, we introduce SmartyPat-Bench, a challenging, naturally expressed, and systematically labeled benchmark derived from real-world high-quality Reddit posts containing subtle logical fallacies. Unlike existing datasets and benchmarks, it provides more detailed annotations of logical fallacies and features more diverse data. To further scale up the study and address the limitations of manual data collection and labeling - such as fallacy-type imbalance and labor-intensive annotation - we introduce SmartyPat, an automated framework powered by logic programming-based oracles. SmartyPat utilizes Prolog rules to systematically generate logically fallacious statements, which are then refined into fluent natural-language sentences by LLMs, ensuring precise fallacy representation. Extensive evaluation demonstrates that SmartyPat produces fallacies comparable in subtlety and quality to human-generated content and significantly outperforms baseline methods. Finally, experiments reveal nuanced insights into LLM capabilities, highlighting that while excessive reasoning steps hinder fallacy detection accuracy, structured reasoning enhances fallacy categorization performance.
- Asia > China (0.04)
- Oceania > Australia > New South Wales (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Follow My Lead: Logical Fallacy Classification with Knowledge-Augmented LLMs
Wang, Olivia Peiyu, Bansal, Tashvi, Bai, Ryan, Chui, Emily M., Gilpin, Leilani H.
Large Language Models (LLMs) suffer from critical reasoning gaps, including a tendency to hallucinate and poor accuracy in classifying logical fallacies. This limitation stems from their default System 1 processing, which is fast and intuitive, whereas reliable reasoning requires the deliberate, effortful System 2 approach (Kahneman, 2011; Li et al., 2025). Since full System 2 training is often prohibitively expensive, we explore a low-cost, instruction-based intervention to bridge this gap. Our methodology introduces a novel stepwise instruction dataset that decomposes fallacy classification into a series of atomic procedural steps (simple binary questions). We further augment this with a final verification step where models consult a relational knowledge graph of related fallacies. This procedural, rule-based intervention yields a significant improvement in LLM logical fallacy classification. Crucially, the approach also provides enhanced transparency into the LLMs' decision-making, highlighting a practical pathway for Neuro-symbolic architectures to address LLM reasoning deficits.
- North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
- North America > United States > North Carolina > Durham County > Durham (0.04)
- North America > United States > California > Santa Clara County > Cupertino (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Workflow (0.95)
Large Language Models Are Better Logical Fallacy Reasoners with Counterargument, Explanation, and Goal-Aware Prompt Formulation
Jeong, Jiwon, Jang, Hyeju, Park, Hogun
The advancement of Large Language Models (LLMs) has greatly improved our ability to process complex language. However, accurately detecting logical fallacies remains a significant challenge. This study presents a novel and effective prompt formulation approach for logical fallacy detection, applicable in both supervised (fine-tuned) and unsupervised (zero-shot) settings. Our method enriches input text incorporating implicit contextual information -- counterarguments, explanations, and goals -- which we query for validity within the context of the argument. We then rank these queries based on confidence scores to inform classification. We evaluate our approach across multiple datasets from 5 domains, covering 29 distinct fallacy types, using models from the GPT and LLaMA series. The results show substantial improvements over state-of-the-art models, with F1 score increases of up to 0.60 in zero-shot settings and up to 0.45 in fine-tuned models. Extensive analyses further illustrate why and how our method excels.
- North America > United States > Indiana > Marion County > Indianapolis (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Boosting Logical Fallacy Reasoning in LLMs via Logical Structure Tree
Logical fallacy uses invalid or faulty reasoning in the construction of a statement. Despite the prevalence and harmfulness of logical fallacies, detecting and classifying logical fallacies still remains a challenging task. We observe that logical fallacies often use connective words to indicate an intended logical relation between two arguments, while the argument semantics does not actually support the logical relation. Inspired by this observation, we propose to build a logical structure tree to explicitly represent and track the hierarchical logic flow among relation connectives and their arguments in a statement. Specifically, this logical structure tree is constructed in an unsupervised manner guided by the constituency tree and a taxonomy of connectives for ten common logical relations, with relation connectives as non-terminal nodes and textual arguments as terminal nodes, and the latter are mostly elementary discourse units. We further develop two strategies to incorporate the logical structure tree into LLMs for fallacy reasoning. Firstly, we transform the tree into natural language descriptions and feed the textualized tree into LLMs as a part of the hard text prompt. Secondly, we derive a relation-aware tree embedding and insert the tree embedding into LLMs as a soft prompt. Experiments on benchmark datasets demonstrate that our approach based on logical structure tree significantly improves precision and recall for both fallacy detection and fallacy classification.
- North America > United States > Texas > Brazos County > College Station (0.14)
- Asia > Singapore (0.05)
- North America > Canada > Ontario > Toronto (0.04)
- (14 more...)
- Media > News (0.71)
- Health & Medicine > Therapeutic Area > Immunology (0.69)
- Government (0.68)
CoCoLoFa: A Dataset of News Comments with Common Logical Fallacies Written by LLM-Assisted Crowds
Yeh, Min-Hsuan, Wan, Ruyuan, Huang, Ting-Hao 'Kenneth'
Detecting logical fallacies in texts can help users spot argument flaws, but automating this detection is not easy. Manually annotating fallacies in large-scale, real-world text data to create datasets for developing and validating detection models is costly. This paper introduces CoCoLoFa, the largest known logical fallacy dataset, containing 7,706 comments for 648 news articles, with each comment labeled for fallacy presence and type. We recruited 143 crowd workers to write comments embodying specific fallacy types (e.g., slippery slope) in response to news articles. Recognizing the complexity of this writing task, we built an LLM-powered assistant into the workers' interface to aid in drafting and refining their comments. Experts rated the writing quality and labeling validity of CoCoLoFa as high and reliable. BERT-based models fine-tuned using CoCoLoFa achieved the highest fallacy detection (F1=0.86) and classification (F1=0.87) performance on its test set, outperforming the state-of-the-art LLMs. Our work shows that combining crowdsourcing and LLMs enables us to more effectively construct datasets for complex linguistic phenomena that crowd workers find challenging to produce on their own.
- North America > United States > Wisconsin > Dane County > Madison (0.14)
- Asia > China > Hong Kong (0.04)
- South America > Venezuela (0.04)
- (14 more...)
- Government (1.00)
- Media (0.97)
- Law > Civil Rights & Constitutional Law (0.93)
- (2 more...)
Face4RAG: Factual Consistency Evaluation for Retrieval Augmented Generation in Chinese
Xu, Yunqi, Cai, Tianchi, Jiang, Jiyan, Song, Xierui
Despite the various FCE passages retrieved from external retrievers or search engines [27], methods proposed earlier, these methods are evaluated on datasets has demonstrated strong performance on various knowledge intensive generated by specific Large Language Models (LLMs). Without a tasks such as open domain conversation [38, 41] and question comprehensive benchmark, it remains unexplored how these FCE answering [19]. Despite its bright prospect, factual consistency remains methods perform on other LLMs with different error distributions a critical issue for RAG systems. Recent assessment reveals or even unseen error types, as these methods may fail to detect the that even for the leading-edge commercial RAG systems like Bing error types generated by other LLMs. To fill this gap, in this paper, Chat and Perplexity, barely over half of their outputs are factual we propose the first comprehensive FCE benchmark Face4RAG for consistent with the references [29]. This issue urges the need of RAG independent of the underlying LLM. Our benchmark consists studying factual consistency evaluation (FCE) in the RAG task. of a synthetic dataset built upon a carefully designed typology for Various FCE methods have been proposed to evaluate the factual factuality inconsistency error and a real-world dataset constructed consistency of specific RAG systems, among which a two-step from six commonly used LLMs, enabling evaluation of FCE methods approach shows promising results, especially for evaluating long on specific error types or real-world error distributions.
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.05)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- North America > United States (0.04)
- (2 more...)
- Education (1.00)
- Health & Medicine > Consumer Health (0.68)
Detecting Fallacies in Climate Misinformation: A Technocognitive Approach to Identifying Misleading Argumentation
Zanartu, Francisco, Cook, John, Wagner, Markus, Garcia, Julian
Misinformation about climate change is a complex societal issue requiring holistic, interdisciplinary solutions at the intersection between technology and psychology. One proposed solution is a "technocognitive" approach, involving the synthesis of psychological and computer science research. Psychological research has identified that interventions in response to misinformation require both fact-based (e.g., factual explanations) and technique-based (e.g., explanations of misleading techniques) content. However, little progress has been made on documenting and detecting fallacies in climate misinformation. In this study, we apply a previously developed critical thinking methodology for deconstructing climate misinformation, in order to develop a dataset mapping different types of climate misinformation to reasoning fallacies. This dataset is used to train a model to detect fallacies in climate misinformation. Our study shows F1 scores that are 2.5 to 3.5 better than previous works. The fallacies that are easiest to detect include fake experts and anecdotal arguments, while fallacies that require background knowledge, such as oversimplification, misrepresentation, and slothful induction, are relatively more difficult to detect. This research lays the groundwork for development of solutions where automatically detected climate misinformation can be countered with generative technique-based corrections.
- Oceania > Australia > Victoria (0.04)
- Oceania > Australia > New South Wales (0.04)
- Europe > Monaco (0.04)
- (7 more...)
Can a Hallucinating Model help in Reducing Human "Hallucination"?
Sundaram, Sowmya S, Alwar, Balaji
The prevalence of unwarranted beliefs, spanning pseudoscience, logical fallacies, and conspiracy theories, presents substantial societal hurdles and the risk of disseminating misinformation. Utilizing established psychometric assessments, this study explores the capabilities of large language models (LLMs) vis-a-vis the average human in detecting prevalent logical pitfalls. We undertake a philosophical inquiry, juxtaposing the rationality of humans against that of LLMs. Furthermore, we propose methodologies for harnessing LLMs to counter misconceptions, drawing upon psychological models of persuasion such as cognitive dissonance theory and elaboration likelihood theory. Through this endeavor, we highlight the potential of LLMs as personalized misinformation debunking agents.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Middle East > Iran (0.04)
- Research Report (0.64)
- Questionnaire & Opinion Survey (0.46)
- Banking & Finance > Economy (0.47)
- Government > Regional Government > North America Government > United States Government (0.46)
NL2FOL: Translating Natural Language to First-Order Logic for Logical Fallacy Detection
Lalwani, Abhinav, Chopra, Lovish, Hahn, Christopher, Trippel, Caroline, Jin, Zhijing, Sachan, Mrinmaya
Logical fallacies are common errors in reasoning that undermine the logic of an argument. Automatically detecting logical fallacies has important applications in tracking misinformation and validating claims. In this paper, we design a process to reliably detect logical fallacies by translating natural language to First-order Logic (FOL) step-by-step using Large Language Models (LLMs). We then utilize Satisfiability Modulo Theory (SMT) solvers to reason about the validity of the formula and classify inputs as either a fallacy or valid statement. Our model also provides a novel means of utilizing LLMs to interpret the output of the SMT solver, offering insights into the counter-examples that illustrate why a given sentence is considered a logical fallacy. Our approach is robust, interpretable and does not require training data or fine-tuning. We evaluate our model on a mixed dataset of fallacies and valid sentences. The results demonstrate improved performance compared to end-to-end LLMs, with our classifier achieving an F1-score of 71\% on the Logic dataset. The approach is able to generalize effectively, achieving an F1-score of 73% on the challenge set, LogicClimate, outperforming state-of-the-art models by 21% despite its much smaller size.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (5 more...)
- Media > News (0.67)
- Health & Medicine > Therapeutic Area (0.47)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Evaluation of an LLM in Identifying Logical Fallacies: A Call for Rigor When Adopting LLMs in HCI Research
Lim, Gionnieve, Perrault, Simon T.
There is increasing interest in the adoption of LLMs in HCI research. However, LLMs may often be regarded as a panacea because of their powerful capabilities with an accompanying oversight on whether they are suitable for their intended tasks. We contend that LLMs should be adopted in a critical manner following rigorous evaluation. Accordingly, we present the evaluation of an LLM in identifying logical fallacies that will form part of a digital misinformation intervention. By comparing to a labeled dataset, we found that GPT-4 achieves an accuracy of 0.79, and for our intended use case that excludes invalid or unidentified instances, an accuracy of 0.90. This gives us the confidence to proceed with the application of the LLM while keeping in mind the areas where it still falls short. The paper describes our evaluation approach, results and reflections on the use of the LLM for our intended task.
- North America > United States > California (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- (3 more...)