SELF-[IN]CORRECT: LLMs Struggle with Refining Self-Generated Responses

Jiang, Dongwei, Zhang, Jingyu, Weller, Orion, Weir, Nathaniel, Van Durme, Benjamin, Khashabi, Daniel

Apr-4-2024–arXiv.org Artificial Intelligence

Can LLMs continually improve their previous outputs for better results? An affirmative answer would require LLMs to be better at discriminating among previously-generated alternatives, than generating initial responses. We explore the validity of this hypothesis in practice. We first introduce a unified framework that allows us to compare the generative and discriminative capability of any model on any task. Then, in our resulting experimental analysis of several LLMs, we do not observe the performance of those models on discrimination to be reliably better than generation. We hope these findings inform the growing literature on self-improvement AI systems.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Apr-4-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > New York (0.14)

Genre:
- Research Report > New Finding (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.71)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found