AITopics | self-correct

Collaborating Authors

self-correct

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Training Language Models to Self-Correct via Reinforcement Learning

Kumar, Aviral, Zhuang, Vincent, Agarwal, Rishabh, Su, Yi, Co-Reyes, John D, Singh, Avi, Baumli, Kate, Iqbal, Shariq, Bishop, Colton, Roelofs, Rebecca, Zhang, Lei M, McKinney, Kay, Shrivastava, Disha, Paduraru, Cosmin, Tucker, George, Precup, Doina, Behbahani, Feryal, Faust, Aleksandra

arXiv.org Artificial IntelligenceSep-19-2024

Self-correction is a highly desirable capability of large language models (LLMs), yet it has consistently been found to be largely ineffective in modern LLMs. Existing approaches for training self-correction either require multiple models or rely on a more capable model or other forms of supervision. To this end, we develop a multi-turn online reinforcement learning (RL) approach, SCoRe, that significantly improves an LLM's self-correction ability using entirely self-generated data. To build SCoRe, we first show that variants of supervised fine-tuning (SFT) on offline model-generated correction traces are insufficient for instilling self-correction behavior. In particular, we observe that training via SFT either suffers from a distribution mismatch between the training data and the model's own responses or implicitly prefers only a certain mode of correction behavior that is often not effective at test time. SCoRe addresses these challenges by training under the model's own distribution of self-generated correction traces and using appropriate regularization to steer the learning process into learning a self-correction strategy that is effective at test time as opposed to simply fitting high-reward responses for a given prompt. This regularization prescribes running a first phase of RL on a base model to generate a policy initialization that is less susceptible to collapse and then using a reward bonus to amplify self-correction during training. When applied to Gemini 1.0 Pro and 1.5 Flash models, we find that SCoRe achieves state-of-the-art self-correction performance, improving the base models' self-correction by 15.6% and 9.1% respectively on the MATH and HumanEval benchmarks.

reinforcement learning, self-correct, training language model

arXiv.org Artificial Intelligence

2409.12917

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Large Language Models Can Self-Correct with Minimal Effort

Wu, Zhenyu, Zeng, Qingkai, Zhang, Zhihan, Tan, Zhaoxuan, Shen, Chao, Jiang, Meng

arXiv.org Artificial IntelligenceJun-23-2024

Intrinsic self-correct was a method that instructed large language models (LLMs) to verify and correct their responses without external feedback. Unfortunately, the study concluded that the LLMs could not self-correct reasoning yet. We find that a simple yet effective verification method can unleash inherent capabilities of the LLMs. That is to mask a key condition in the question, add the current response to construct a verification question, and predict the condition to verify the response. The condition can be an entity in an open-domain question or a numeric value in a math question, which requires minimal effort (via prompting) to identify. We propose an iterative verify-then-correct framework to progressively identify and correct (probably) false responses, named ProCo. We conduct experiments on three reasoning tasks. On average, ProCo, with GPT-3.5-Turbo as the backend LLM, yields $+6.8$ exact match on four open-domain question answering datasets, $+14.1$ accuracy on three arithmetic reasoning datasets, and $+9.6$ accuracy on a commonsense reasoning dataset, compared to Self-Correct.

key condition, reasoning, verification, (16 more...)

arXiv.org Artificial Intelligence

2405.14092

Country:

Asia > India (0.29)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom > England > Greater London > London > Wimbledon (0.05)
(12 more...)

Genre: Research Report > New Finding (0.87)

Industry:

Leisure & Entertainment > Sports (1.00)
Law (0.94)
Government (0.93)
Education > Educational Setting > K-12 Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Large Language Models Cannot Self-Correct Reasoning Yet

Huang, Jie, Chen, Xinyun, Mishra, Swaroop, Zheng, Huaixiu Steven, Yu, Adams Wei, Song, Xinying, Zhou, Denny

arXiv.org Artificial IntelligenceOct-3-2023

Large Language Models (LLMs) have emerged as a groundbreaking technology with their unparalleled text generation capabilities across various applications. Nevertheless, concerns persist regarding the accuracy and appropriateness of their generated content. A contemporary methodology, self-correction, has been proposed as a remedy to these issues. Building upon this premise, this paper critically examines the role and efficacy of self-correction within LLMs, shedding light on its true potential and limitations. Central to our investigation is the notion of intrinsic self-correction, whereby an LLM attempts to correct its initial responses based solely on its inherent capabilities, without the crutch of external feedback. In the context of reasoning, our research indicates that LLMs struggle to self-correct their responses without external feedback, and at times, their performance might even degrade post self-correction. Drawing from these insights, we offer suggestions for future research and practical applications in this field.

arxiv preprint arxiv, language model, reasoning, (15 more...)

arXiv.org Artificial Intelligence

2310.01798

Country:

Europe > France > Occitanie > Haute-Garonne > Toulouse (0.05)
North America > United States > Illinois (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre: Research Report > New Finding (0.88)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Building a 3D Printer That Self-Corrects With AI

#artificialintelligenceDec-13-2019, 13:24:51 GMT

Most 3D-printed objects are prototypes or one-off creations, in large part because 3D printing is more finicky than traditional manufacturing. Because the process works by adding layers of material atop each other, subtle changes in temperatures or material quality can result in imperfections and hours of lost work. Inkbit, a Boston-area 3D printing company, is using machine vision and artificial intelligence to help its equipment course correct. Javier Ramos, co-founder and director of hardware at Inkbit, said Inkbit's machine vision technology instantly scans the objects it prints, relying on AI to correct for any mistakes made. He imagines a future where Inkbit's tech is used on every factory floor, printing out millions of products more cheaply -- and faster -- than traditional manufacturing processes ever could.

inkbit, optical property, ramo, (16 more...)

#artificialintelligence

Industry: Machinery > Industrial Machinery (0.93)

Technology: Information Technology > Artificial Intelligence > Vision (0.94)

Add feedback