Adversarial Testing as a Tool for Interpretability: Length-based Overfitting of Elementary Functions in Transformers

Zavoral, Patrik, Variš, Dušan, Bojar, Ondřej

arXiv.org Artificial Intelligence 

We study length-based generalization, whereby the novel out-of-distribution condition is induced solely by controlling The Transformer model has a tendency to overfit the range of the sequences in the training and validation various aspects of the training data, such as sets. This type of generalization is especially apparent in the overall sequence length. We study elementary tasks where the pattern is elementary, and therefore easily string edit functions using a defined set of identifiable by humans. For example, when we illustrate the error indicators to interpret the behaviour of the operation of string reversal on short strings, humans will sequence-to-sequence Transformer. We show that correctly reverse also a long string. Such elementary string generalization to shorter sequences is often possible, edit functions thus highlight the extent to which universal but confirm that longer sequences are highly approximators may be limited by data. The elementary problematic, although partially correct answers functions we experiment with are solvable using very small are often obtained. Additionally, we find that Transformers (1-2 layers, 1 attention head; Weiss et al., other structural characteristics of the sequences, 2021) and it is possible to construct such Transformers such as subsegment length, may be equally important.