Proof Recommendation System for the HOL4 Theorem Prover
Dekhil, Nour, Rashid, Adnan, Tahar, Sofiene
–arXiv.org Artificial Intelligence
We experimented with various transformer-based language models, such as BERT [9], RoBERTa [10], and T5 [11] for these datasets to identify the most effective model based on our evaluation. After splitting the restructured datasets into a 90-10 ratio for training and testing, we proceeded to train the selected models (block highlighted in yellow) using a grid search of hyperparameters optimization. Given the multitude of possible tactics available at each proof state, we chose to provide multiple recommendations for the next proof step. To assess the accuracy of these recommendations (block highlighted in green), we use the n-correctness rate, which measures the likelihood that a correct tactic from the testing dataset is among the top-n recommended tactics, where n signifies the number of recommended tactics evaluated against the correct tactic. We found out that RoBERTa demonstrated superior performance across most cases for n = 7.
arXiv.org Artificial Intelligence
Dec-31-2024