Formal Language Knowledge Corpus for Retrieval Augmented Generation

Zayyad, Majd, Adi, Yossi

arXiv.org Artificial Intelligence 

Abstract-The integration of retrieval-augmented techniques with LLMs has shown promise in improving performance across various domains. However, their utility in tasks requiring advanced reasoning, such as generating and evaluating mathematical statements and proofs, remains underexplored. This study explores the use of Lean, a programming language for writing mathematical proofs, to populate the knowledge corpus used by RAG systems. We hope for this to lay the foundation to exploring different methods of using RAGs to improve the performance of LLMs in advanced logical reasoning tasks. Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks, but they still face significant challenges, particularly in generating accurate and reliable information. One of the key issues is their tendency to produce hallucinated or incorrect responses.