Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders

Open in new window