Dissecting Persona-Driven Reasoning in Language Models via Activation Patching

Open in new window