Intrinsic Self-Correction in LLMs: Towards Explainable Prompting via Mechanistic Interpretability

Open in new window