TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning
–Neural Information Processing Systems
We propose a novel approach to interactive theorem proving (ITP) using deep reinforcement learning. The proposed framework is able to learn proof search strategies as well as tactic and arguments prediction in an end-to-end manner. We formulate the process of ITP as a Markov decision process (MDP) in which each state represents a set of potential derivation paths. This structure allows us to introduce a search mechanism which enables the agent to efficiently discard (predicted) dead-end derivations and restart from promising alternatives. We implement the framework in the HOL4 theorem prover. Experimental results show that the framework using learned search strategies outperforms existing automated theorem provers (i.e.
Neural Information Processing Systems
Apr-25-2026, 19:58:50 GMT
- Country:
- North America > United States (1.00)
- Genre:
- Research Report > New Finding (0.48)
- Instructional Material > Course Syllabus & Notes (0.46)