Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques

Neural Information Processing Systems 

Training (IIT) which we call Strict IIT (SIIT). SIIT models maintain Tracr's original circuit while being more realistic.