Attention-Only Transformers and Implementing MLPs with Attention Heads
Huben, Robert, Morris, Valerie
–arXiv.org Artificial Intelligence
The transformer architecture was introduced in the landmark 2017 paper Attention is All You Need (Vaswani et al., 2023) and traditionally consists of alternating attention and multilayer-perceptron (MLP) sublayers. Although initially used for machine translation, transformers have been used across a wide range of tasks, including language modeling (Radford et al., 2018; Devlin et al., 2019; Liu et al., 2018), computer vision (Khan et al., 2022; Cornia et al., 2020), and image generation (Parmar et al., 2018). The widespread deployment of transformers has led to increasing interest in mechanistic interpretability (Wang et al., 2022; Conmy et al., 2023), which seeks to convert the computations of transformers into human-understandable explanations. Some interpretability efforts, such as Elhage et al. (2021), focused on attention-only transformers, finding that MLP layers were harder to interpret. This work seeks to supplement those mechanistic interpretability methods by showing that MLP layers in transformers are equivalent to a sum of masked attention heads and therefore can be subjected to interpretability techniques that work on attention-only transformers. In Theorem 3 we show that by including a "bias token" akin to the persistent memory vectors in Sukhbaatar et al. (2019) and using a slightly unusual attention-masking pattern, an MLP layer of size l can be written as the sum of l attention heads with internal dimension 1. We show in Theorem 6 that one can apply this process throughout the entire transformer, converting the typical MLP-and-attention transformer into an attention-only transformer. We then show in Theorems 7 and 8 that attention heads can implement row-wise linear transformations and matrix-level activation functions separately. Finally, we show in Theorem 9 that a slightly augmented network is capable of approximating any masking pattern to within arbitrary error.
arXiv.org Artificial Intelligence
Sep-15-2023