Taming Polysemanticity in LLMs: Provable Feature Recovery via Sparse Autoencoders

Open in new window