Can Large Language Models Really Improve by Self-critiquing Their Own Plans?

Valmeekam, Karthik, Marquez, Matthew, Kambhampati, Subbarao

Oct-12-2023–arXiv.org Artificial Intelligence

There have been widespread claims about Large Language Models (LLMs) being able to successfully verify or self-critique their candidate solutions in reasoning problems in an iterative mode. Intrigued by those claims, in this paper we set out to investigate the verification/self-critiquing abilities of large language models in the context of planning. We evaluate a planning system that employs LLMs for both plan generation and verification. We assess the verifier LLM's performance against ground-truth verification, the impact of self-critiquing on plan generation, and the influence of varying feedback levels on system performance. Using GPT-4, a state-of-the-art LLM, for both generation and verification, our findings reveal that self-critiquing appears to diminish plan generation performance, especially when compared to systems with external, sound verifiers and the LLM verifiers in that system produce a notable number of false positives, compromising the system's reliability. Additionally, the nature of feedback, whether binary or detailed, showed minimal impact on plan generation.

arxiv preprint arxiv, llm, verifier llm, (12 more...)

arXiv.org Artificial Intelligence

Oct-12-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States > Arizona > Maricopa County > Tempe (0.04)

Genre:
- Research Report > New Finding (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (0.35)
    - Performance Analysis > Accuracy (0.52)
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning > Planning & Scheduling (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found