Cumulative Reasoning with Large Language Models

Zhang, Yifan, Yang, Jingqin, Yuan, Yang, Yao, Andrew Chi-Chih

Dec-1-2023–arXiv.org Artificial Intelligence

While language models are powerful and versatile, they often fail to address highly complex problems. This is because solving complex problems requires deliberate thinking, which has been only minimally guided during training. In this paper, we propose a new method called Cumulative Reasoning (CR), which employs language models in a cumulative and iterative manner to emulate human thought processes. By decomposing tasks into smaller components, CR streamlines the problem-solving process, rendering it both more manageable and effective. For logical inference tasks, CR consistently outperforms existing methods with an improvement up to 9.3%, and achieves an accuracy of 98.04% on the curated FOLIO wiki dataset. In the context of the Game of 24, CR achieves an accuracy of 98%, which signifies a substantial enhancement of 24% over the previous state-of-the-art method. Finally, on the MATH dataset, we establish new state-of-the-art results with 58.0% overall accuracy, surpassing the previous best approach by a margin of 4.2%, and achieving 43% relative improvement on the hardest level 5 problems (22.4% to 32.1%). Additionally, we expand the concept of Cumulative Reasoning to incorporate a Python code environment, deliberately omitting external aids such as retrieval and web browsing and focusing solely on the LLM's intrinsic reasoning capabilities within a Python code environment. Our experiments in this setting yielded impressive results, with an overall accuracy of 72.2% on the MATH dataset, significantly outperforming the PAL method with 38.8% relative improvement. Code is available at https://github.com/iiis-ai/cumulative-reasoning.

arxiv preprint arxiv, dataset, language model, (13 more...)

arXiv.org Artificial Intelligence

Dec-1-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Ohio > Athens County
    - Athens (0.04)
  - New York > Bronx County
    - New York City (0.04)
  - Illinois > Cook County
    - Chicago (0.04)
  - California > San Francisco County
    - San Francisco (0.04)
- Europe
  - Western Europe (0.04)
  - France (0.04)
  - Greece (0.04)
  - United Kingdom (0.04)
- Asia
  - Middle East
    - Jordan (0.04)
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.04)
    - Iraq > Baghdad Governorate
      - Baghdad (0.04)
  - China > Shanghai
    - Shanghai (0.04)

Genre:
- Research Report (1.00)

Industry:
- Media (1.00)
- Leisure & Entertainment (1.00)
- Health & Medicine > Therapeutic Area
  - Oncology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Logic & Formal Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Cognitive Science > Problem Solving (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.31)