SAE-SSV: Supervised Steering in Sparse Representation Spaces for Reliable Control of Language Models

Open in new window