Enhancing Automated Interpretability with Output-Centric Feature Descriptions

Open in new window