Sparse Autoencoders Can Interpret Randomly Initialized Transformers

Open in new window