AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders

Open in new window