SAGE: An Agentic Explainer Framework for Interpreting SAE Features in Language Models

Open in new window