Sparse Autoencoders Enable Scalable and Reliable Circuit Identification in Language Models

Open in new window