Transformer Key-Value Memories Are Nearly as Interpretable as Sparse Autoencoders

Open in new window