CausalGym: Benchmarking causal interpretability methods on linguistic tasks

Open in new window