CANDY: Benchmarking LLMs' Limitations and Assistive Potential in Chinese Misinformation Fact-Checking
Guo, Ruiling, Yang, Xinwei, Huang, Chen, Zhang, Tong, Hu, Yong
–arXiv.org Artificial Intelligence
The effectiveness of large language models (LLMs) to fact-check misinformation remains uncertain, despite their growing use. To this end, we present CANDY, a benchmark designed to systematically evaluate the capabilities and limitations of LLMs in fact-checking Chinese misinformation. Specifically, we curate a carefully annotated dataset of ~20k instances. Our analysis shows that current LLMs exhibit limitations in generating accurate fact-checking conclusions, even when enhanced with chain-of-thought reasoning and few-shot prompting. To understand these limitations, we develop a taxonomy to categorize flawed LLM-generated explanations for their conclusions and identify factual fabrication as the most common failure mode. Although LLMs alone are unreliable for fact-checking, our findings indicate their considerable potential to augment human performance when deployed as assistive tools in scenarios. Our dataset and code can be accessed at https://github.com/SCUNLP/CANDY
arXiv.org Artificial Intelligence
Sep-5-2025
- Country:
- North America > United States (1.00)
- Asia > China (0.94)
- Genre:
- Research Report
- Experimental Study (0.92)
- New Finding (0.87)
- Research Report
- Industry:
- Technology: