Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models

Wang, Yuhui, Li, Changjiang, Chen, Guangke, Liang, Jiacheng, Wang, Ting

Sep-30-2025–arXiv.org Artificial Intelligence

Large reasoning models (LRMs) exhibit unprecedented capabilities in solving complex problems through Chain-of-Thought (CoT) reasoning. However, recent studies reveal that their final answers often contradict their own reasoning traces. We hypothesize that this inconsistency stems from two competing mechanisms for generating answers: CoT reasoning and memory retrieval. To test this hypothesis, we conduct controlled experiments that challenge LRMs with misleading cues during reasoning and/or corrupted answers during retrieval. Our results across models and datasets confirm that both mechanisms operate simultaneously, with their relative dominance influenced by multiple factors: problem domains, model scales, and fine-tuning approaches (e.g., reinforcement learning vs. distillation). The findings reveal a critical limitation in current reasoning fine-tuning paradigms: models can exploit the retrieval mechanism as a shortcut, effectively "hacking" the reward signal and undermining genuine reasoning development. To address this challenge, we introduce FARL, a novel fine-tuning framework that integrates memory unlearning with reinforcement learning. By carefully suppressing retrieval shortcuts during the fine-tuning process, FARL promotes reasoning-dominant behavior and enhances generalizable reasoning capabilities.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Sep-30-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Indonesia
  - Bali (0.04)
- Europe (0.04)
- North America > United States
  - New York > Suffolk County > Stony Brook (0.04)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Education > Curriculum
  - Subject-Specific Education (1.00)
- Health & Medicine (1.00)
- Information Technology > Security & Privacy (1.00)
- Law > Civil Rights & Constitutional Law (0.94)

Technology:
- Information Technology > Artificial Intelligence
  - Cognitive Science > Problem Solving (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language
    - Chatbot (0.70)
    - Large Language Model (1.00)
  - Representation & Reasoning (1.00)