Codebook Features: Sparse and Discrete Interpretability for Neural Networks

Open in new window