Analyze Feature Flow to Enhance Interpretation and Steering in Language Models

Open in new window