Mechanistic Interpretability of Socio-Political Frames in Language Models

Open in new window