Adversarial Testing as a Tool for Interpretability: Length-based Overfitting of Elementary Functions in Transformers

Open in new window