A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models

Open in new window