Sparse Attention Post-Training for Mechanistic Interpretability

Open in new window