Confident RAG: Enhancing the Performance of LLMs for Mathematics Question Answering through Multi-Embedding and Confidence Scoring
Chen, Shiting, Zhao, Zijian, Chen, Jinsong
–arXiv.org Artificial Intelligence
Abstract--Large Language Models (LLMs) hold significant promise for mathematics education, yet they often struggle with complex mathematical reasoning. While Retrieval-Augmented Generation (RAG) mitigates these issues by grounding LLMs in external knowledge, its effectiveness remains unstable, heavily dependent on the choice of a single embedding model. Moving beyond static RAG workflows, we draw on agentic workflow patterns, a paradigm that introduces structured task decomposition and collaboration to enhance system performance. We propose and examine two novel approaches that combine the benefits of multiple embedding models. While our Mixture-Embedding RAG approach (fusing retrieved documents) shows limited gains, our Confident RAG method (generating multiple answers and selecting the one with the highest confidence score) demonstrates significant improvement. Experimental results show that Confident RAG achieved average accuracy improvements of approximately 10% over vanilla LLMs and 5% over vanilla RAG. The consistent results across different LLMs and embedding models indicate that Confident RAG is an efficient plug-and-play solution for trustworthy mathematical AI assistants. Finally, we discuss how this work lays the groundwork for deploying Agentic RAG systems in educational settings, where autonomous planning and iterative refinement can be built upon our robust retrieval foundation. ARGE language models (LLMs) have demonstrated remarkable capabilities across various domains [1]-[3], showing particular promise for educational applications. However, their tendency to hallucinate [4] remains a significant barrier to reliable use in learning environments, especially in mathematics education where accuracy is crucial [5].
arXiv.org Artificial Intelligence
Dec-2-2025
- Country:
- Asia
- China > Hong Kong (0.05)
- Middle East > Jordan (0.04)
- Asia
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Education
- Curriculum > Subject-Specific Education (0.54)
- Educational Setting (0.48)
- Education
- Technology: