Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders

Open in new window