Logic & Formal Reasoning
Let's Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM's Math Capability
Wang, Ruida, Li, Yuxin, Fung, Yi R., Zhang, Tong
Enhancing the mathematical reasoning capabilities of LLMs has garnered significant attention in both the mathematical and computer science communities. Recent works have made substantial progress in both Natural Language (NL) reasoning and Formal Language (FL) reasoning by leveraging the potential of pure Reinforcement Learning (RL) methods on base models. However, RL approaches struggle to impart new capabilities not presented in the base model, highlighting the need to integrate more knowledge like FL into NL math reasoning effectively. Yet, this integration is challenging due to inherent disparities in problem structure and reasoning format between NL and FL. To address these challenges, we introduce **NL-FL HybridReasoning (NFL-HR)**, an end-to-end framework designed to incorporate the FL expert into NL math problem-solving. To bridge the NL and FL input format gap, we propose the NL-FL Problem Alignment method, which reformulates the Question-Answering (QA) problems in NL as existence theorems in FL. Subsequently, the Mixed Problem Input technique we provide enables the FL reasoner to handle both QA and existence problems concurrently. Lastly, we mitigate the NL and FL output format gap in reasoning through an LLM-based Answer Extraction mechanism. Comprehensive experiments demonstrate that the NFL-HR framework achieves **89.80**% and **84.34%** accuracy rates on the MATH-500 and the AMC benchmarks, surpassing the NL baseline by **4.60%** and **4.82%**, respectively. Notably, some problems resolved by our framework remain unsolved by the NL baseline model even under a larger number of trials.
Adaptive Selection of Symbolic Languages for Improving LLM Logical Reasoning
Wang, Xiangyu, Yang, Haocheng, Cheng, Fengxiang, Liu, Fenrong
Large Language Models (LLMs) still struggle with complex logical reasoning. While previous works achieve remarkable improvements, their performance is highly dependent on the correctness of translating natural language (NL) problems into a symbolic language (SL). Though numerous works focusing on improving this translation accuracy, they only consider the similarity between the meaning of SL and NL, overlooking another crucial influencing factor, the selection of the target SL type itself. For example, first-order logic language specializes in logical reasoning with categorical syllogisms and complex quantifiers, while Boolean satisfiability formalism excels at representing constraint satisfaction like partial problems. To our knowledge, this is the first paper to claim and verify that different NL logical reasoning problem corresponds to different optimal SL formalization for translation. Based on this, we propose a methods to improve the logical reasoning performance of LLMs by adaptively selecting the most suitable SL for each problem prior to translation. Specifically, we leverage LLMs to select the target SL among first-order logic, logic programming and Boolean satisfiability and then translate the problem in NL to target SL expressions as well as employ the corresponding logical solver to derive the final answer. Experimental results on benchmarks show that our adaptive selection method significantly outperforms translating all into single SL and randomly selecting the SL. On a mixed dataset of these benchmarks, our approach achieves 96% accuracy, which improving performance by 25% compared to the second highest accuracy from the first-order logic translation.
Extended Triangular Method: A Generalized Algorithm for Contradiction Separation Based Automated Deduction
Xu, Yang, Chen, Shuwei, Liu, Jun, Cao, Feng, He, Xingxing
Automated deduction lies at the core of Artificial Intelligence (AI), underpinning theorem proving, formal verification, and logical reasoning. Despite decades of progress, reconciling deductive completeness with computational efficiency remains an enduring challenge. Traditional reasoning calculi, grounded in binary resolution, restrict inference to pairwise clause interactions and thereby limit deductive synergy among multiple clauses. The Contradiction Separation Extension (CSE) framework, introduced in 2018, proposed a dynamic multi-clause reasoning theory that redefined logical inference as a process of contradiction separation rather than sequential resolution. While that work established the theoretical foundation, its algorithmic realization remained unformalized and unpublished. This work presents the Extended Triangular Method (ETM), a generalized contradiction-construction algorithm that formalizes and extends the internal mechanisms of contradiction separation. The ETM unifies multiple contradiction-building strategies, including the earlier Standard Extension method, within a triangular geometric framework that supports flexible clause interaction and dynamic synergy. ETM serves as the algorithmic core of several high-performance theorem provers, CSE, CSE-E, CSI-E, and CSI-Enig, whose competitive results in standard first-order benchmarks (TPTP problem sets and CASC 2018-2015) empirically validate the effectiveness and generality of the proposed approach. By bridging theoretical abstraction and operational implementation, ETM advances the contradiction separation paradigm into a generalized, scalable, and practically competitive model for automated reasoning, offering new directions for future research in logical inference and theorem proving.
Learning to Guarantee Type Correctness in Code Generation through Type-Guided Program Synthesis
Huang, Zhechong, Zhang, Zhao, Ji, Ruyi, Xia, Tingxuan, Zhu, Qihao, Cao, Qinxiang, Sun, Zeyu, Xiong, Yingfei
Language models have shown remarkable proficiency in code generation; nevertheless, ensuring type correctness remains a challenge. Although traditional methods, such as constrained decoding, alleviate this problem by externally rejecting untypable code, the model itself does not effectively learn type reasoning internally, which ultimately limits its overall performance. This paper introduces TyFlow, a novel system that internalizes type reasoning within code generation to guide the model to learn the type system. The core of our approach is a novel type-guided program synthesis system that maintains an isomorphism between type derivation trees and synthesis derivation trees, enabling a new code representation based on synthesis decision sequences rather than traditional text-based token sequences. By offloading the complexity of type system learning to the representation itself, models can redirect their computational resources toward higher-level program semantics. Our evaluation shows that TyFlow not only eliminates type errors but also significantly improves functional correctness, highlighting the importance of aligning LMs with type systems internally.
Curved Boolean Logic: A Contextual Generalization of Propositional Logic with Algorithmic Consequences
von Liechtenstein, Maximilian R. P.
Curved Boolean Logic (CBL) generalizes propositional logic by allowing local truth assignments that do not extend to a single global valuation, analogous to curvature in geometry. We give equivalent sheaf and exclusivity-graph semantics and a context-aware proof calculus that is conservative in the flat limit. We formalize CBL-SAT and basic complexity (NP-complete in general) and present operational operators (CBL-AC and CBL-CONS) that prune contradictions earlier on classical hardware. We model noise with iid, AR(1)-correlated, and adversarial bounded perturbations and provide permutation-based significance with Benjamini-Hochberg FDR control. A Colab-ready notebook (ancillary files) regenerates all figures and statistics. We position CBL relative to KCBS, CSW, and sheaf frameworks and outline links to SAT/CSP and robustness/adapter stability in large language models.
Aristotle: IMO-level Automated Theorem Proving
Achim, Tudor, Best, Alex, Bietti, Alberto, Der, Kevin, Fédérico, Mathïs, Gukov, Sergei, Halpern-Leistner, Daniel, Henningsgard, Kirsten, Kudryashov, Yury, Meiburg, Alexander, Michelsen, Martin, Patterson, Riley, Rodriguez, Eric, Scharff, Laura, Shanker, Vikram, Sicca, Vladmir, Sowrirajan, Hari, Swope, Aidan, Tamas, Matyas, Tenev, Vlad, Thomm, Jonathan, Williams, Harold, Wu, Lawrence
We introduce Aristotle, an AI system that combines formal verification with informal reasoning, achieving gold-medal-equivalent performance on the 2025 International Mathematical Olympiad problems. Aristotle integrates three main components: a Lean proof search system, an informal reasoning system that generates and formalizes lemmas, and a dedicated geometry solver. Our system demonstrates state-of-the-art performance with favorable scaling properties for automated theorem proving.