On the Intersection of Self-Correction and Trust in Language Models

Nov-5-2023–arXiv.org Artificial Intelligence

WARNING: This paper contains model outputs that may be considered offensive. Large Language Models (LLMs) have demonstrated remarkable capabilities in performing complex cognitive tasks. However, their complexity and lack of transparency have raised several trustworthiness concerns, including the propagation of misinformation and toxicity. Recent research has explored the self-correction capabilities of LLMs to enhance their performance. In this work, we investigate whether these self-correction capabilities can be harnessed to improve the trustworthiness of LLMs. We conduct experiments focusing on two key aspects of trustworthiness: truthfulness and toxicity. Our findings reveal that self-correction can lead to improvements in toxicity and truthfulness, but the extent of these improvements varies depending on the specific aspect of trustworthiness and the nature of the task. Interestingly, our study also uncovers instances of "self-doubt" in LLMs during the self-correction process, introducing a new set of challenges that need to be addressed. Large Language Models (LLMs) have emerged as a powerful tool in the field of artificial intelligence, demonstrating remarkable capabilities in performing complex cognitive tasks (Zhao et al., 2023b). These models, trained on vast amounts of data, can generate human-like text, translate languages, answer questions, and even write code(Wei et al., 2022a).

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Nov-5-2023

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom (0.31)

Genre:
- Research Report > New Finding (0.48)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found