Interpreting Attention Layer Outputs with Sparse Autoencoders

Open in new window