Cumulative Reasoning with Large Language Models
Zhang, Yifan, Yang, Jingqin, Yuan, Yang, Yao, Andrew Chi-Chih
–arXiv.org Artificial Intelligence
While language models are powerful and versatile, they often fail to address highly complex problems. This is because solving complex problems requires deliberate thinking, which has been only minimally guided during training. In this paper, we propose a new method called Cumulative Reasoning (CR), which employs language models in a cumulative and iterative manner to emulate human thought processes. By decomposing tasks into smaller components, CR streamlines the problem-solving process, rendering it both more manageable and effective. For logical inference tasks, CR consistently outperforms existing methods with an improvement up to 9.3%, and achieves an accuracy of 98.04% on the curated FOLIO wiki dataset. In the context of the Game of 24, CR achieves an accuracy of 98%, which signifies a substantial enhancement of 24% over the previous state-of-the-art method. Finally, on the MATH dataset, we establish new state-of-the-art results with 58.0% overall accuracy, surpassing the previous best approach by a margin of 4.2%, and achieving 43% relative improvement on the hardest level 5 problems (22.4% to 32.1%). Additionally, we expand the concept of Cumulative Reasoning to incorporate a Python code environment, deliberately omitting external aids such as retrieval and web browsing and focusing solely on the LLM's intrinsic reasoning capabilities within a Python code environment. Our experiments in this setting yielded impressive results, with an overall accuracy of 72.2% on the MATH dataset, significantly outperforming the PAL method with 38.8% relative improvement. Code is available at https://github.com/iiis-ai/cumulative-reasoning.
arXiv.org Artificial Intelligence
Dec-1-2023
- Country:
- Asia > Middle East
- Iraq (0.14)
- Europe (0.68)
- North America > United States
- Asia > Middle East
- Genre:
- Research Report (1.00)
- Industry:
- Health & Medicine > Therapeutic Area
- Oncology (0.46)
- Leisure & Entertainment (1.00)
- Media (1.00)
- Health & Medicine > Therapeutic Area
- Technology: