A Theoretical Understanding of Self-Correction through In-context Alignment

Mar-21-2026, 21:51:52 GMT–Neural Information Processing Systems

Going beyond mimicking limited human experiences, recent studies show initial evidence that, like humans, large language models (LLMs) are capable of improving their abilities purely by self-correction, i.e., correcting previous responses through self-examination, as seen in models like OpenAI o1. Nevertheless, little is known about how such capabilities arise. In this work, based on a simplified setup akin to an alignment task, we theoretically analyze self-correction from an in-context learning perspective, showing that when LLMs give relatively accurate self-examinations as rewards, they are capable of refining responses in an in-context way.

large language model, natural language, proceedings, (5 more...)

Neural Information Processing Systems

Mar-21-2026, 21:51:52 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (0.60)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)