Confidence Matters: Revisiting Intrinsic Self-Correction Capabilities of Large Language Models

Li, Loka, Chen, Zhenhao, Chen, Guangyi, Zhang, Yixuan, Su, Yusheng, Xing, Eric, Zhang, Kun

May-13-2024–arXiv.org Artificial Intelligence

The recent success of Large Language Models (LLMs) has catalyzed an increasing interest in their self-correction capabilities. This paper presents a comprehensive investigation into the intrinsic self-correction of LLMs, attempting to address the ongoing debate about its feasibility. Our research has identified an important latent factor - the "confidence" of LLMs - during the self-correction process. Overlooking this factor may cause the models to over-criticize themselves, resulting in unreliable conclusions regarding the efficacy of self-correction. We have experimentally observed that LLMs possess the capability to understand the "confidence" in their own responses. It motivates us to develop an "If-or-Else" (IoE) prompting framework, designed to guide LLMs in assessing their own "confidence", facilitating intrinsic self-corrections. We conduct extensive experiments and demonstrate that our IoE-based Prompt can achieve a consistent improvement regarding the accuracy of self-corrected responses over the initial answers. Our study not only sheds light on the underlying factors affecting self-correction in LLMs, but also introduces a practical framework that utilizes the IoE prompting principle to efficiently improve self-correction capabilities with "confidence". The code is available at https://github.com/MBZUAI-CLeaR/IoE-Prompting.git.

arxiv preprint arxiv, critical prompt, final answer, (14 more...)

arXiv.org Artificial Intelligence

May-13-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - California (0.14)
    - Ohio (0.04)
    - Arkansas (0.04)
    - Virginia (0.04)
    - Maine (0.04)
    - Alabama (0.04)
    - Minnesota (0.04)
    - Utah (0.04)
    - Indiana (0.04)
    - Missouri (0.04)
    - Maryland (0.04)
    - Kansas (0.04)
    - North Carolina (0.04)
    - Tennessee (0.04)
    - New Hampshire (0.04)
    - Rhode Island (0.04)
    - Georgia (0.04)
    - Oklahoma (0.04)
    - Nebraska (0.04)
    - District of Columbia > Washington (0.04)
    - West Virginia
      - Cabell County > Huntington (0.04)
      - Kanawha County > Charleston (0.04)
    - Texas > Denton County
      - The Colony (0.05)
    - Illinois > Cook County
      - Chicago (0.04)
    - Massachusetts > Norfolk County
      - Dedham (0.05)
    - Kentucky > Jefferson County
      - Jeffersontown (0.04)
    - Florida > Bay County
      - Panama City (0.04)
    - Connecticut > Hartford County
      - Hartford (0.04)
    - Pennsylvania > Allegheny County
      - Pittsburgh (0.04)
  - Canada
    - British Columbia (0.14)
    - Saskatchewan (0.04)
    - Ontario > Toronto (0.04)
    - Alberta > Census Division No. 11
      - Edmonton Metropolitan Region > Edmonton (0.04)
- Europe
  - Norway (0.14)
  - Sweden (0.04)
  - Denmark (0.04)
- Asia > Middle East
  - Jordan (0.04)
  - UAE > Abu Dhabi Emirate
    - Abu Dhabi (0.14)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Consumer Products & Services > Restaurants (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found