AITopics | claim verification

Collaborating Authors

claim verification

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AVERIMATEC: ADataset for Automatic Verification of Image-Text Claims with Evidence from the Web

Neural Information Processing SystemsJun-14-2026, 11:22:04 GMT

Textual claims are often accompanied by images to enhance their credibility and spread on social media, but this also raises concerns about the spread of misinformation. Existing datasets for automated verification of image-text claims remain limited, as they often consist of synthetic claims and lack evidence annotations to capture the reasoning behind the verdict. In this work, we introduce AVERIMATEC, a dataset consisting of 1,297 real-world image-text claims. Each claim is annotated with question-answer (QA) pairs containing evidence from the web, reflecting a decomposed reasoning regarding the verdict. We mitigate common challenges in fact-checking datasets such as contextual dependence, temporal leakage, and evidence insufficiency, via claim normalization, temporally constrained evidence annotation, and a two-stage sufficiency check. We assess the consistency of the annotation in AVERIMATEC via inter-annotator studies, achieving a κ = 0.742 on verdicts and 74.7% consistency on QA pairs. We also propose a novel evaluation method for evidence retrieval and conduct extensive experiments to establish baselines for verifying image-text claims using open-web evidence.

large language model, machine learning, question answering, (25 more...)

Neural Information Processing Systems

Country: North America > United States (0.67)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry:

Media > News (1.00)
Information Technology > Security & Privacy (1.00)
Government (0.93)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Information Management > Search (1.00)
Information Technology > Communications > Social Media (1.00)
(6 more...)

Add feedback

Automatic Fact-checking in English and Telugu

Chikkala, Ravi Kiran, Anikina, Tatiana, Skachkova, Natalia, Vykopal, Ivan, Agerri, Rodrigo, van Genabith, Josef

arXiv.org Artificial IntelligenceDec-11-2025

False information poses a significant global challenge, and manually verifying claims is a time-consuming and resource-intensive process. In this research paper, we experiment with different approaches to investigate the effectiveness of large language models (LLMs) in classifying factual claims by their veracity and generating justifications in English and Telugu. The key contributions of this work include the creation of a bilingual English-Telugu dataset and the benchmarking of different veracity classification approaches based on LLMs.

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2509.26415

Country:

North America > United States (1.00)
Europe (1.00)
Asia > Middle East > UAE (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Government (1.00)
Media > News (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback

Format Matters: The Robustness of Multimodal LLMs in Reviewing Evidence from Tables and Charts

Ho, Xanh, Wu, Yun-Ang, Kumar, Sunisth, Boudin, Florian, Takasu, Atsuhiro, Aizawa, Akiko

arXiv.org Artificial IntelligenceNov-14-2025

With the growing number of submitted scientific papers, there is an increasing demand for systems that can assist reviewers in evaluating research claims. Experimental results are a core component of scientific work, often presented in varying formats such as tables or charts. Understanding how robust current multimodal large language models (multimodal LLMs) are at verifying scientific claims across different evidence formats remains an important and underexplored challenge. In this paper, we design and conduct a series of experiments to assess the ability of multimodal LLMs to verify scientific claims using both tables and charts as evidence. To enable this evaluation, we adapt two existing datasets of scientific papers by incorporating annotations and structures necessary for a multimodal claim verification task. Using this adapted dataset, we evaluate 12 multimodal LLMs and find that current models perform better with table-based evidence while struggling with chart-based evidence. We further conduct human evaluations and observe that humans maintain strong performance across both formats, unlike the models. Our analysis also reveals that smaller multimodal LLMs (under 8B) show weak correlation in performance between table-based and chart-based tasks, indicating limited cross-modal generalization. These findings highlight a critical gap in current models' multimodal reasoning capabilities. We suggest that future multimodal LLMs should place greater emphasis on improving chart understanding to better support scientific claim verification.

computational linguistic, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2511.10075

Country:

Asia (1.00)
North America > United States (0.68)
Europe > Austria > Vienna (0.15)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

SynClaimEval: A Framework for Evaluating the Utility of Synthetic Data in Long-Context Claim Verification

Elaraby, Mohamed, Maheswari, Jyoti Prakash

arXiv.org Artificial IntelligenceNov-13-2025

Large Language Models (LLMs) with extended context windows promise direct reasoning over long documents, reducing the need for chunking or retrieval. Constructing annotated resources for training and evaluation, however, remains costly. Synthetic data offers a scalable alternative, and we introduce SynClaimEval, a framework for evaluating synthetic data utility in long-context claim verification -- a task central to hallucination detection and fact-checking. Our framework examines three dimensions: (i) input characteristics, by varying context length and testing generalization to out-of-domain benchmarks; (ii) synthesis logic, by controlling claim complexity and error type variation; and (iii) explanation quality, measuring the degree to which model explanations provide evidence consistent with predictions. Experiments across benchmarks show that long-context synthesis can improve verification in base instruction-tuned models, particularly when augmenting existing human-written datasets. Moreover, synthesis enhances explanation quality, even when verification scores do not improve, underscoring its potential to strengthen both performance and explainability.

computational linguistic, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.09539

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (1.00)

Industry:

Banking & Finance (0.46)
Government (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

A Benchmark for Open-Domain Numerical Fact-Checking Enhanced by Claim Decomposition

Venktesh, V, Prabhu, Deepali, Anand, Avishek

arXiv.org Artificial IntelligenceOct-28-2025

Fact-checking numerical claims is critical as the presence of numbers provide mirage of veracity despite being fake potentially causing catastrophic impacts on society. The prior works in automatic fact verification do not primarily focus on natural numerical claims. A typical human fact-checker first retrieves relevant evidence addressing the different numerical aspects of the claim and then reasons about them to predict the veracity of the claim. Hence, the search process of a human fact-checker is a crucial skill that forms the foundation of the verification process. Emulating a real-world setting is essential to aid in the development of automated methods that encompass such skills. However, existing benchmarks employ heuristic claim decomposition approaches augmented with weakly supervised web search to collect evidences for verifying claims. This sometimes results in less relevant evidences and noisy sources with temporal leakage rendering a less realistic retrieval setting for claim verification. Hence, we introduce QuanTemp++: a dataset consisting of natural numerical claims, an open domain corpus, with the corresponding relevant evidence for each claim. The evidences are collected through a claim decomposition process approximately emulating the approach of human fact-checker and veracity labels ensuring there is no temporal leakage. Given this dataset, we also characterize the retrieval performance of key claim decomposition paradigms. Finally, we observe their effect on the outcome of the verification pipeline and draw insights. The code for data pipeline along with link to data can be found at https://github.com/VenkteshV/QuanTemp_Plus

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.22055

Country:

Europe (1.00)
Asia (1.00)
North America > United States (0.68)

Genre: Research Report (0.64)

Industry:

Health & Medicine (0.94)
Media > News (0.93)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)

Add feedback

AVerImaTeC: A Dataset for Automatic Verification of Image-Text Claims with Evidence from the Web

Cao, Rui, Ding, Zifeng, Guo, Zhijiang, Schlichtkrull, Michael, Vlachos, Andreas

arXiv.org Artificial IntelligenceOct-8-2025

Textual claims are often accompanied by images to enhance their credibility and spread on social media, but this also raises concerns about the spread of misinformation. Existing datasets for automated verification of image-text claims remain limited, as they often consist of synthetic claims and lack evidence annotations to capture the reasoning behind the verdict. In this work, we introduce AVerImaTeC, a dataset consisting of 1,297 real-world image-text claims. Each claim is annotated with question-answer (QA) pairs containing evidence from the web, reflecting a decomposed reasoning regarding the verdict. We mitigate common challenges in fact-checking datasets such as contextual dependence, temporal leakage, and evidence insufficiency, via claim normalization, temporally constrained evidence annotation, and a two-stage sufficiency check. We assess the consistency of the annotation in AVerImaTeC via inter-annotator studies, achieving a $κ=0.742$ on verdicts and $74.7\%$ consistency on QA pairs. We also propose a novel evaluation method for evidence retrieval and conduct extensive experiments to establish baselines for verifying image-text claims using open-web evidence.

large language model, machine learning, question answering, (26 more...)

arXiv.org Artificial Intelligence

2505.17978

Country: North America > United States (0.45)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry:

Media > News (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Information Management > Search (1.00)
Information Technology > Communications > Social Media (1.00)
(7 more...)

Add feedback

Veri-R1: Toward Precise and Faithful Claim Verification via Online Reinforcement Learning

He, Qi, Qian, Cheng, Chen, Xiusi, He, Bingxiang, Fung, Yi R., Ji, Heng

arXiv.org Artificial IntelligenceOct-7-2025

Claim verification with large language models (LLMs) has recently attracted growing attention, due to their strong reasoning capabilities and transparent verification processes compared to traditional answer-only judgments. However, existing approaches to online claim verification, which requires iterative evidence retrieval and reasoning, still mainly rely on prompt engineering or pre-designed reasoning workflows, without unified training to improve necessary skills. Therefore, we introduce Veri-R1, an online reinforcement learning (RL) framework that enables an LLM to interact with a search engine and to receive reward signals that explicitly shape its planning, retrieval, and reasoning behaviors. This dynamic interaction of LLM with retrieval systems more accurately reflects real-world verification scenarios and fosters comprehensive verification skills. Empirical results show that Veri-R1 improves joint accuracy by up to 30% and doubles the evidence score, often surpassing its larger-scale model counterparts. Ablation studies further reveal the impact of reward components, and the link between output logits and label accuracy. Our results highlight the effectiveness of online RL for precise and faithful claim verification, providing an important foundation for future research. We release our code to support community progress in LLM empowered claim verification.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.01932

Country: North America > United States > West Virginia > Kanawha County (0.28)

Genre:

Research Report > New Finding (0.86)
Instructional Material > Online (0.61)

Industry:

Leisure & Entertainment > Sports (0.46)
Law (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Evaluating Uncertainty Quantification Methods in Argumentative Large Language Models

Zhou, Kevin, Dejl, Adam, Freedman, Gabriel, Chen, Lihu, Rago, Antonio, Toni, Francesca

arXiv.org Artificial IntelligenceOct-6-2025

Research in uncertainty quantification (UQ) for large language models (LLMs) is increasingly important towards guaranteeing the reliability of this groundbreaking technology. We explore the integration of LLM UQ methods in argumentative LLMs (ArgLLMs), an explainable LLM framework for decision-making based on computational argumentation in which UQ plays a critical role. We conduct experiments to evaluate ArgLLMs' performance on claim verification tasks when using different LLM UQ methods, inherently performing an assessment of the UQ methods' effectiveness. Moreover, the experimental procedure itself is a novel way of evaluating the effectiveness of UQ methods, especially when intricate and potentially contentious statements are present. Our results demonstrate that, despite its simplicity, direct prompting is an effective UQ strategy in ArgLLMs, outperforming considerably more complex approaches.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2510.02339

Country: North America > United States (0.47)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

MuPlon: Multi-Path Causal Optimization for Claim Verification through Controlling Confounding

Guo, Hanghui, Di, Shimin, De Meo, Pasquale, Chen, Zhangze, Zhu, Jia

arXiv.org Artificial IntelligenceOct-1-2025

Abstract--As a critical task in data quality control, claim verification aims to curb the spread of misinformation by assessing the truthfulness of claims based on a wide range of evidence. However, traditional methods often overlook the complex interactions between evidence, leading to unreliable verification results. A straightforward solution represents the claim and evidence as a fully connected graph, which we define as the Claim-Evidence Graph (C-E Graph). Nevertheless, claim verification methods based on fully connected graphs face two primary confounding challenges, Data Noise and Data Biases. T o address these challenges, we propose a novel framework, Multi-Path Causal Optimization (MuPlon). In the front-door path, MuPlon extracts highly relevant subgraphs and constructs reasoning paths, further applying counterfactual reasoning to eliminate data biases within these paths. The experimental results demonstrate that MuPlon outperforms existing methods and achieves state-of-the-art performance.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.25715

Genre:

Research Report > Strength High (0.46)
Research Report > Experimental Study (0.46)

Industry: Media > News (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Table-Text Alignment: Explaining Claim Verification Against Tables in Scientific Papers

Ho, Xanh, Kumar, Sunisth, Wu, Yun-Ang, Boudin, Florian, Takasu, Atsuhiro, Aizawa, Akiko

arXiv.org Artificial IntelligenceSep-18-2025

Scientific claim verification against tables typically requires predicting whether a claim is supported or refuted given a table. However, we argue that predicting the final label alone is insufficient: it reveals little about the model's reasoning and offers limited interpretability. To address this, we reframe table-text alignment as an explanation task, requiring models to identify the table cells essential for claim verification. We build a new dataset by extending the SciTab benchmark with human-annotated cell-level rationales. Annotators verify the claim label and highlight the minimal set of cells needed to support their decision. After the annotation process, we utilize the collected information and propose a taxonomy for handling ambiguous cases. Our experiments show that (i) incorporating table alignment information improves claim verification performance, and (ii) most LLMs, while often predicting correct labels, fail to recover human-aligned rationales, suggesting that their predictions do not stem from faithful reasoning.

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2506.10486

Country:

Europe (0.68)
Asia > Japan (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback