Holistically Explainable Vision Transformers

Böhle, Moritz, Fritz, Mario, Schiele, Bernt

arXiv.org Machine Learning 

Transformers increasingly dominate the machine learning landscape across many tasks and domains, which increases the importance for understanding their outputs. While their attention modules provide partial insight into their inner workings, the attention scores have been shown to be insufficient for explaining the models as a whole. To address this, we propose B-cos transformers, which inherently provide holistic explanations for their decisions. Specifically, we formulate each model component--such as the multi-layer perceptrons, attention layers, and the tokenisation module--to be dynamic linear, which allows us to faithfully summarise the entire transformer via a single linear transform. We apply our proposed design to Vision Transformers (ViTs) and show that the resulting models, dubbed Bcos-ViTs, are highly interpretable and perform competitively to baseline ViTs on ImageNet. Code will be made available soon. However, recently they are often surpassed by transformers (Vaswani et al., 2017), which-- if the current development is any indication-- will replace CNNs for ever more tasks and domains. Transformers are thus bound to impact many aspects of our lives: from healthcare, over judicial decisions, to autonomous driving. Given the sensitive nature of such areas, it is of utmost importance to ensure that we can explain the underlying models, which still remains a challenge for transformers. To explain transformers, prior work often focused on the models' attention layers (Jain & Wallace, 2019; Serrano & Smith, 2019; Abnar & Zuidema, 2020; Barkan et al., 2021), as they inherently compute their output in an interpretable manner. For a detailed discussion, see supplement. These model components are given by: a tokenisation module, a mechanism for providing positional information to the model, multi-layer perceptrons (MLPs), as well as normalisation and attention layers, see Figure 1a. By addressing the interpretability of each component individually, we obtain transformers that inherently explain their decisions, see, for example Figure 1 and Figure 1b. In detail, our approach is based on the idea of designing each component to be dynamic linear, such that it computes an input-dependent linear transform. Böhle et al. (2021; 2022), s.t. it can be summarised by a single linear transform for each input.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found