Problem Solving
Posterior-GRPO: Rewarding Reasoning Processes in Code Generation
Fan, Lishui, Zhang, Yu, Chen, Mouxiang, Liu, Zhongxin
Reinforcement learning (RL) has significantly advanced code generation for large language models (LLMs). However, current paradigms rely on outcome-based rewards from test cases, neglecting the quality of the intermediate reasoning process. While supervising the reasoning process directly is a promising direction, it is highly susceptible to reward hacking, where the policy model learns to exploit the reasoning reward signal without improving final outcomes. To address this, we introduce a unified framework that can effectively incorporate the quality of the reasoning process during RL. First, to enable reasoning evaluation, we develop LCB-RB, a benchmark comprising preference pairs of superior and inferior reasoning processes. Second, to accurately score reasoning quality, we introduce an Optimized-Degraded based (OD-based) method for reward model training. This method generates high-quality preference pairs by systematically optimizing and degrading initial reasoning paths along curated dimensions of reasoning quality, such as factual accuracy, logical rigor, and coherence. A 7B parameter reward model with this method achieves state-of-the-art (SOT A) performance on LCB-RB and generalizes well to other benchmarks. Finally, we introduce Posterior-GRPO (P-GRPO), a novel RL method that conditions process-based rewards on task success. A 7B parameter model with P-GRPO achieves superior performance across diverse code generation tasks, outperforming outcome-only baselines by 4.5%, achieving comparable performance to GPT -4-Turbo. We further demonstrate the generalizability of our approach by extending it to mathematical tasks. Our models, dataset, and code are publicly available.
Reasoning Efficiently Through Adaptive Chain-of-Thought Compression: A Self-Optimizing Framework
Huang, Kerui, Liu, Shuhan, Hu, Xing, Xu, Tongtong, Bao, Lingfeng, Xia, Xin
Chain-of-Thought (CoT) reasoning enhances Large Language Models (LLMs) by prompting intermediate steps, improving accuracy and robustness in arithmetic, logic, and commonsense tasks. However, this benefit comes with high computational costs: longer outputs increase latency, memory usage, and KV-cache demands. These issues are especially critical in software engineering tasks where concise and deterministic outputs are required. To investigate these trade-offs, we conduct an empirical study based on code generation benchmarks. The results reveal that longer CoT does not always help. Excessive reasoning often causes truncation, accuracy drops, and latency up to five times higher, with failed outputs consistently longer than successful ones. These findings challenge the assumption that longer reasoning is inherently better and highlight the need for adaptive CoT control. Motivated by this, we propose SEER (Self-Enhancing Efficient Reasoning), an adaptive framework that compresses CoT while preserving accuracy. SEER combines Best-of-N sampling with task-aware adaptive filtering, dynamically adjusting thresholds based on pre-inference outputs to reduce verbosity and computational overhead. We then evaluate SEER on three software engineering tasks and one math task. On average, SEER shortens CoT by 42.1%, improves accuracy by reducing truncation, and eliminates most infinite loops. These results demonstrate SEER as a practical method to make CoT-enhanced LLMs more efficient and robust, even under resource constraints.
MIRA: Empowering One-Touch AI Services on Smartphones with MLLM-based Instruction Recommendation
Bian, Zhipeng, Zhu, Jieming, Xie, Xuyang, Dai, Quanyu, Zhao, Zhou, Dong, Zhenhua
The rapid advancement of generative AI technologies is driving the integration of diverse AI-powered services into smartphones, transforming how users interact with their devices. To simplify access to predefined AI services, this paper introduces MIRA, a pioneering framework for task instruction recommendation that enables intuitive one-touch AI tasking on smartphones. With MIRA, users can long-press on images or text objects to receive contextually relevant instruction recommendations for executing AI tasks. Our work introduces three key innovations: 1) A multimodal large language model (MLLM)-based recommendation pipeline with structured reasoning to extract key entities, infer user intent, and generate precise instructions; 2) A template-augmented reasoning mechanism that integrates high-level reasoning templates, enhancing task inference accuracy; 3) A prefix-tree-based constrained decoding strategy that restricts outputs to predefined instruction candidates, ensuring coherent and intent-aligned suggestions. Through evaluation using a real-world annotated datasets and a user study, MIRA has demonstrated substantial improvements in the accuracy of instruction recommendation. The encouraging results highlight MIRA's potential to revolutionize the way users engage with AI services on their smartphones, offering a more seamless and efficient experience.
Asterisk Operator
We propose the \textbf{Asterisk Operator} ($\ast$-operator), a novel unified framework for abstract reasoning based on Adjacency-Structured Parallel Propagation (ASPP). The operator formalizes structured reasoning tasks as local, parallel state evolution processes guided by implicit relational graphs. We prove that the $\ast$-operator maintains local computational constraints while achieving global reasoning capabilities, providing an efficient and convergent computational paradigm for abstract reasoning problems. Through rigorous mathematical analysis and comprehensive experiments on ARC2 challenges and Conway's Game of Life, we demonstrate the operator's universality, convergence properties, and superior performance. Our innovative Embedding-Asterisk distillation method achieves 100\% accuracy on ARC2 validation with only 6M parameters, representing a significant breakthrough in neural-symbolic reasoning. \textbf{Keywords:} Abstract Reasoning, Adjacency Structure, Parallel Propagation, Asterisk Operator, Convergence, Universal Approximation
FRIT: Using Causal Importance to Improve Chain-of-Thought Faithfulness
Swaroop, Anand, Nallani, Akshat, Uboweja, Saksham, Uzdenova, Adiliia, Nguyen, Michael, Zhu, Kevin, Dev, Sunishchal, Panda, Ashwinee, Sharma, Vasu, Chaudhary, Maheep
Chain-of-thought (CoT) reasoning has emerged as a powerful tool for improving large language model performance on complex tasks, but recent work shows that reasoning steps often fail to causally influence the final answer, creating brittle and untrustworthy outputs. Prior approaches focus primarily on measuring faithfulness, while methods for systematically improving it remain limited. We introduce Faithful Reasoning via Intervention Training (FRIT), a scalable alignment method that trains models to produce causally consistent reasoning by learning from systematically corrupted examples. FRIT generates synthetic training data by intervening on individual reasoning steps in model-generated CoTs, creating faithful/unfaithful pairs that highlight when reasoning breaks down. We then apply Direct Preference Optimization to teach models to prefer causally consistent reasoning paths. Evaluating on Qwen3-8B and Mistral-7B-v0.1 across factual and symbolic reasoning tasks, FRIT increases faithful reasoning by $3.4$ percentage points for Mistral on GSM8K while improving accuracy by $7.6$ percentage points. Our approach provides the first scalable, supervision-free method for training language models to produce more reliable and interpretable reasoning, addressing a critical gap between reasoning performance and trustworthiness. We release our code at \href{https://github.com/Anut-py/frit}.
Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Qin, Luozheng, Gong, Jia, Sun, Yuqing, Li, Tianjiao, Yang, Mengping, Yang, Xiaomeng, Qu, Chao, Tan, Zhiyu, Li, Hao
Chain-of-Thought (CoT) reasoning has been widely adopted to enhance Large Language Models (LLMs) by decomposing complex tasks into simpler, sequential subtasks. However, extending CoT to vision-language reasoning tasks remains challenging, as it often requires interpreting transitions of visual states to support reasoning. Existing methods often struggle with this due to limited capacity of modeling visual state transitions or incoherent visual trajectories caused by fragmented architectures. To overcome these limitations, we propose Uni-CoT, a Unified Chain-of-Thought framework that enables coherent and grounded multimodal reasoning within a single unified model. The key idea is to leverage a model capable of both image understanding and generation to reason over visual content and model evolving visual states. However, empowering a unified model to achieve that is non-trivial, given the high computational cost and the burden of training. To address this, Uni-CoT introduces a novel two-level reasoning paradigm: A Macro-Level CoT for high-level task planning and A Micro-Level CoT for subtask execution. This design significantly reduces the computational overhead. Furthermore, we introduce a structured training paradigm that combines interleaved image-text supervision for macro-level CoT with multi-task objectives for micro-level CoT. Together, these innovations allow Uni-CoT to perform scalable and coherent multi-modal reasoning. Furthermore, thanks to our design, all experiments can be efficiently completed using only 8 A100 GPUs with 80GB VRAM each. Experimental results on reasoning-driven image generation benchmark (WISE) and editing benchmarks (RISE and KRIS) indicates that Uni-CoT demonstrates SOTA performance and strong generalization, establishing Uni-CoT as a promising solution for multi-modal reasoning. Project Page and Code: https://sais-fuxi.github.io/projects/uni-cot/
Causal-Symbolic Meta-Learning (CSML): Inducing Causal World Models for Few-Shot Generalization
Modern deep learning models excel at pattern recognition but remain fundamentally limited by their reliance on spurious correlations, leading to poor generalization and a demand for massive datasets. We argue that a key ingredient for human-like intelligence-robust, sample-efficient learning-stems from an understanding of causal mechanisms. In this work, we introduce Causal-Symbolic Meta-Learning (CSML), a novel framework that learns to infer the latent causal structure of a task distribution. CSML comprises three key modules: a perception module that maps raw inputs to disentangled symbolic representations; a differentiable causal induction module that discovers the underlying causal graph governing these symbols and a graph-based reasoning module that leverages this graph to make predictions. By meta-learning a shared causal world model across a distribution of tasks, CSML can rapidly adapt to novel tasks, including those requiring reasoning about interventions and counterfactuals, from only a handful of examples. We introduce CausalWorld, a new physics-based benchmark designed to test these capabilities. Our experiments show that CSML dramatically outperforms state-of-the-art meta-learning and neuro-symbolic baselines, particularly on tasks demanding true causal inference.
Shapes of Cognition for Computational Cognitive Modeling
McShane, Marjorie, Nirenburg, Sergei, Oruganti, Sanjay, English, Jesse
Shapes of cognition is a new conceptual paradigm for the computational cognitive modeling of Language - Endowed Intelligent Agents (LEIAs) . S hapes are remembered constellations of sensory, linguistic, conceptual, episodic, and procedural knowledge that allow agents to cut through the complexity of real life the same way as people do: by expecting things to be typical, recognizing patterns, acting by habit, reasoning by analogy, satisficing, and generally minimizing cognitive load to the degree situations permit . Atypical outcomes are treated using shapes - based recovery method s, such as learning on the fly, asking a human partner for help, or seeking an actionable, even if imperfect, situational understanding . Although shapes is an umbrella term, it is not vague: shapes - based modeling involves particular objectives, hypotheses, modeling strategies, knowledge bases, and actual models of wide - ranging phenomena, all implemented within a particular cognitive architecture . Such s pecificity is needed both to vet the our hypotheses and to achieve our practical aims of building useful agent systems that are explainable, extensible, and worthy of our trust, even in critical domains . However, a lthough the LEIA example of shapes - based modeling is specific, the principles can be applied more broadly, giving new life to knowledge - based and hybrid AI .
Large Language Models Imitate Logical Reasoning, but at what Cost?
McGinness, Lachlan, Baumgartner, Peter
We present a longitudinal study which evaluates the reasoning capability of frontier Large Language Models over an eighteen month period. We measured the accuracy of three leading models from December 2023, September 2024 and June 2025 on true or false questions from the PrOntoQA dataset and their faithfulness to reasoning strategies provided through in-context learning. The improvement in performance from 2023 to 2024 can be attributed to hidden Chain of Thought prompting. The introduction of thinking models allowed for significant improvement in model performance between 2024 and 2025. We then present a neuro-symbolic architecture which uses LLMs of less than 15 billion parameters to translate the problems into a standardised form. We then parse the standardised forms of the problems into a program to be solved by Z3, an SMT solver, to determine the satisfiability of the query. We report the number of prompt and completion tokens as well as the computational cost in FLOPs for open source models. The neuro-symbolic approach significantly reduces the computational cost while maintaining near perfect performance. The common approximation that the number of inference FLOPs is double the product of the active parameters and total tokens was accurate within 10\% for all experiments.
Physical Complexity of a Cognitive Artifact
Kardeş, Gülce, Krakauer, David, Grochow, Joshua
There are currently two well established domains for studying general problem solving. The first describes strategies used by humans on both experimental and real-world tasks. Human problem solving is captured through a number of frameworks including skill acquisition [1] and automaticity [2], the application of expert knowledge [3, 4], the use of heuristics [5, 6], reinforcement learning and conditioning [7, 8], Bayesian inference [9, 10], analogy making [11, 12], collective intelligence and cognition [13, 14], simulation intelligence [15], the use of external representations [16, 17] and the synergy of mind and matter through exbodiment [18]. The second domain, computational problem solving, investigates algorithms that enable computers to tackle problems effectively . Within this domain, two branches especially pertinent to the present question are: (i) computational complexity theory, which analyzes the resources (time, memory, etc.) required to solve problems as functions of input size, typically in the asymptotic limit; and (ii) the study of search algorithms, which seeks efficient solutions to specific tasks (e.g., games and puzzles) by exploiting the combinatorial structure of state spaces, often via heuristics [19-22].