Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations with MDL-SAEs

Open in new window