Existing LLMs Are Not Self-Consistent For Simple Tasks
Lin, Zhenru, Tao, Jiawen, Yuan, Yang, Yao, Andrew Chi-Chih
–arXiv.org Artificial Intelligence
Large Language Models (LLMs) have grown increasingly powerful, yet ensuring their decisions remain transparent and trustworthy requires self-consistency -- no contradictions in their internal reasoning. Our study reveals that even on simple tasks, such as comparing points on a line or a plane, or reasoning in a family tree, all smaller models are highly inconsistent, and even state-of-the-art models like DeepSeek-R1 and GPT-o4-mini are not fully self-consistent. To quantify and mitigate these inconsistencies, we introduce inconsistency metrics and propose two automated methods -- a graph-based and an energy-based approach. While these fixes provide partial improvements, they also highlight the complexity and importance of self-consistency in building more reliable and interpretable AI. The code and data are available at https://github.com/scorpio-nova/llm-self-consistency.
arXiv.org Artificial Intelligence
Jun-24-2025
- Country:
- Asia
- Europe > Austria
- Vienna (0.14)
- North America > United States
- California > Los Angeles County
- Los Angeles (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Massachusetts > Suffolk County
- Boston (0.04)
- Texas (0.04)
- Washington (0.04)
- California > Los Angeles County
- Genre:
- Research Report > New Finding (0.68)
- Technology: