Teaching Language Models Mechanistic Explainability Through Arrow-Pushing

Open in new window