Unsupervised Interpretable Basis Extraction for Concept-Based Visual Explanations

Doumanoglou, Alexandros, Asteriadis, Stylianos, Zarpalas, Dimitrios

arXiv.org Artificial Intelligence 

Abstract--An important line of research attempts to explain CNN image classifier predictions and intermediate layer representations in terms of human understandable concepts. In this work, we expand on previous works in the literature that use annotated concept datasets to extract interpretable feature space directions and propose an unsupervised post-hoc method to extract a disentangling interpretable basis by looking for the rotation of the feature space that explains sparse one-hot thresholded transformed representations of pixel activations. They can be used in robotics, visual understanding, automatic risk assessment and more. However, to a human expert, CNNs are often black-boxes and the reasoning behind their predictions can be unclear. Beyond this early result, more recently, rigorous experimentation showed that linear separability of features corresponding to different semantic concepts, increases towards the top layer [6]. The latter has been attributed to the top layer's linearity and the fact that intermediate layers are enforced to A. Doumanoglou is with the Information Technologies Institute (ITI), Centre for Research and Technology HELLAS (CERTH), Thessaloniki, D. Zarpalas is with the Information Technologies Institute (ITI), Centre for Figure 1: Left: In a standard convolution layer with D filters, all the filters work together to transform each input patch to a feature vector of spatial dimensionality 1 1. Thus the dimensionality of the feature space equals the number of filters in the layer, and each spatial element of the transformed representation, constitutes a sample in this feature space. Middle: To find an interpretable basis in the aforementioned feature space in a supervised way, it means to train a set of linear classifiers (concept detectors), one for each interpretable concept, by using feature vectors corresponding to image patches containing the concept. We observe, that in a successfully learned interpretable basis, a single pixel is classified positively by at most one classifier, among a group of classifiers that are trained to detect mutually-exclusive concepts. Projecting new, transformed sparse representation.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found