Chain-of-Thought Reasoning In The Wild Is Not Always Faithful
Arcuschin, Iván, Janiak, Jett, Krzyzanowski, Robert, Rajamanoharan, Senthooran, Nanda, Neel, Conmy, Arthur
–arXiv.org Artificial Intelligence
Chain-of-Thought (CoT) reasoning has significantly advanced state-of-the-art AI capabilities. However, recent studies have shown that CoT reasoning is not always faithful, i.e. CoT reasoning does not always reflect how models arrive at conclusions. So far, most of these studies have focused on unfaithfulness in unnatural contexts where an explicit bias has been introduced. In contrast, we show that unfaithful CoT can occur on realistic prompts with no artificial bias. Our results reveal non-negligible rates of several forms of unfaithful reasoning in frontier models: Sonnet 3.7 (16.3%), DeepSeek R1 (5.3%) and ChatGPT-4o (7.0%) all answer a notable proportion of question pairs unfaithfully. Specifically, we find that models rationalize their implicit biases in answers to binary questions ("implicit post-hoc rationalization"). For example, when separately presented with the questions "Is X bigger than Y?" and "Is Y bigger than X?", models sometimes produce superficially coherent arguments to justify answering Yes to both questions or No to both questions, despite such responses being logically contradictory. We also investigate restoration errors (Dziri et al., 2023), where models make and then silently correct errors in their reasoning, and unfaithful shortcuts, where models use clearly illogical reasoning to simplify solving problems in Putnam questions (a hard benchmark). Our findings raise challenges for AI safety work that relies on monitoring CoT to detect undesired behavior.
arXiv.org Artificial Intelligence
Mar-19-2025
- Country:
- Asia
- China (0.04)
- Indonesia > Bali (0.04)
- Middle East
- Iraq (0.04)
- Jordan (0.04)
- Syria > Damascus Governorate
- Damascus (0.04)
- Europe
- Ireland
- Connacht > County Galway (0.04)
- Connaught > County Galway
- Galway (0.04)
- Portugal (0.04)
- Russia > Central Federal District
- Moscow Oblast > Moscow (0.04)
- United Kingdom > England
- East Sussex (0.04)
- Ireland
- North America
- Canada > Ontario
- Toronto (0.04)
- Cuba > Matanzas Province
- Varadero (0.04)
- United States
- California (0.04)
- New Hampshire > Belknap County (0.04)
- Missouri > Platte County (0.04)
- Nevada > Carson City (0.14)
- Georgia > Long County (0.04)
- Pennsylvania > Cambria County (0.04)
- Oregon
- Clackamas County (0.04)
- Yamhill County (0.04)
- New Jersey > Cape May County (0.04)
- New York
- Cattaraugus County (0.04)
- Erie County > Buffalo (0.04)
- Wisconsin
- Eau Claire County (0.04)
- Sheboygan County > Sheboygan (0.14)
- Florida > Hillsborough County
- Tampa (0.04)
- University (0.04)
- Rhode Island > Newport County (0.04)
- Canada > Ontario
- South America
- Argentina > Pampas
- Buenos Aires F.D. > Buenos Aires (0.04)
- Brazil (0.04)
- Chile (0.04)
- Argentina > Pampas
- Asia
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Education (0.46)
- Leisure & Entertainment (0.68)
- Media > Film (0.46)
- Technology: