GSAE: Graph-Regularized Sparse Autoencoders for Robust LLM Safety Steering

Open in new window