Disentangling Memory and Reasoning Ability in Large Language Models

Jin, Mingyu, Luo, Weidi, Cheng, Sitao, Wang, Xinyi, Hua, Wenyue, Tang, Ruixiang, Wang, William Yang, Zhang, Yongfeng

arXiv.org Artificial Intelligence 

Recent advancements in Large Language Models (LLMs) have showcased their impressive inference capabilities in handling complex natural language tasks that require both extensive knowledge and sophisticated reasoning abilities (OpenAI, 2024; Touvron et al., 2023; Wei et al., 2022a). LLMs have demonstrated the ability to memorize vast amounts of knowledge, and techniques like Chain-of-Thought (CoT) (Wei et al., 2022b), Tree of thoughts (ToT) (Yao et al., 2024) have been developed to further enhance their inference abilities by decomposing complex problems into several simpler, single-step processes. These methods enable LLMs to tackle multi-step inference tasks more effectively by organizing the thought process into discrete, focused actions (Feng et al., 2024; Jin et al., 2024b; Wei et al., 2022b). However, despite these advancements, existing inference frameworks often operate as an opaque process without explicit separation between knowledge retrieval and reasoning steps. This makes it unclear what specific knowledge the model utilizes and how it performs reasoning, leaving the decision-making process ambiguous. For complex, knowledge-intensive tasks, such as multi-hop inference, LLMs often struggle to effectively leverage their memory for inference (Yang et al., 2023; Jin et al., 2024b; Wang et al., 2024b; Cheng et al., 2024; Liu et al., 2024). Such tasks typically require the ability to recall relevant knowledge for each reasoning step (or "hop") and then perform inference over that recalled memory (Wang et al., 2024c). The lack of structure in the output and effective memory utilization can lead to issues such as hallucinations, where LLMs generate plausible but incorrect information (Xu et al., 2024; Li et al., 2024a), and "forgetting," where relevant information is lost across reasoning steps (Jin et al., 2024b; Chen & Shu, 2023), disrupting the logical flow.