Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning

Open in new window