The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?

Open in new window