CANDY: Benchmarking LLMs' Limitations and Assistive Potential in Chinese Misinformation Fact-Checking

Guo, Ruiling, Yang, Xinwei, Huang, Chen, Zhang, Tong, Hu, Yong

Sep-5-2025–arXiv.org Artificial Intelligence

The effectiveness of large language models (LLMs) to fact-check misinformation remains uncertain, despite their growing use. To this end, we present CANDY, a benchmark designed to systematically evaluate the capabilities and limitations of LLMs in fact-checking Chinese misinformation. Specifically, we curate a carefully annotated dataset of ~20k instances. Our analysis shows that current LLMs exhibit limitations in generating accurate fact-checking conclusions, even when enhanced with chain-of-thought reasoning and few-shot prompting. To understand these limitations, we develop a taxonomy to categorize flawed LLM-generated explanations for their conclusions and identify factual fabrication as the most common failure mode. Although LLMs alone are unreliable for fact-checking, our findings indicate their considerable potential to augment human performance when deployed as assistive tools in scenarios. Our dataset and code can be accessed at https://github.com/SCUNLP/CANDY

explanation, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Sep-5-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (1.00)
- Asia > China (0.94)

Genre:
- Research Report
  - Experimental Study (0.92)
  - New Finding (0.87)

Industry:
- Media > News (1.00)
- Health & Medicine > Therapeutic Area
  - Infections and Infectious Diseases (1.00)
  - Immunology (1.00)
- Government > Regional Government
  - North America Government > United States Government (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found