Steering Large Language Model Activations in Sparse Spaces

Open in new window