stylespace
Unifying Image Counterfactuals and Feature Attributions with Latent-Space Adversarial Attacks
Goldwasser, Jeremy, Hooker, Giles
Counterfactuals are a popular framework for interpreting machine learning predictions. These what if explanations are notoriously challenging to create for computer vision models: standard gradient-based methods are prone to produce adversarial examples, in which imperceptible modifications to image pixels provoke large changes in predictions. We introduce a new, easy-to-implement framework for counterfactual images that can flexibly adapt to contemporary advances in generative modeling. Our method, Counterfactual Attacks, resembles an adversarial attack on the representation of the image along a low-dimensional manifold. In addition, given an auxiliary dataset of image descriptors, we show how to accompany counterfactuals with feature attribution that quantify the changes between the original and counterfactual images. These importance scores can be aggregated into global counterfactual explanations that highlight the overall features driving model predictions. While this unification is possible for any counterfactual method, it has particular computational efficiency for ours. We demonstrate the efficacy of our approach with the MNIST and CelebA datasets.
StyleDomain: Efficient and Lightweight Parameterizations of StyleGAN for One-shot and Few-shot Domain Adaptation
Alanov, Aibek, Titov, Vadim, Nakhodnov, Maksim, Vetrov, Dmitry
Domain adaptation of GANs is a problem of fine-tuning GAN models pretrained on a large dataset (e.g. StyleGAN) to a specific domain with few samples (e.g. painting faces, sketches, etc.). While there are many methods that tackle this problem in different ways, there are still many important questions that remain unanswered. In this paper, we provide a systematic and in-depth analysis of the domain adaptation problem of GANs, focusing on the StyleGAN model. We perform a detailed exploration of the most important parts of StyleGAN that are responsible for adapting the generator to a new domain depending on the similarity between the source and target domains. As a result of this study, we propose new efficient and lightweight parameterizations of StyleGAN for domain adaptation. Particularly, we show that there exist directions in StyleSpace (StyleDomain directions) that are sufficient for adapting to similar domains. For dissimilar domains, we propose Affine+ and AffineLight+ parameterizations that allows us to outperform existing baselines in few-shot adaptation while having significantly less training parameters. Finally, we examine StyleDomain directions and discover their many surprising properties that we apply for domain mixing and cross-domain image morphing. Source code can be found at https://github.com/AIRI-Institute/StyleDomain.
CheXplaining in Style: Counterfactual Explanations for Chest X-rays using StyleGAN
Atad, Matan, Dmytrenko, Vitalii, Li, Yitong, Zhang, Xinyue, Keicher, Matthias, Kirschke, Jan, Wiestler, Bene, Khakzar, Ashkan, Navab, Nassir
Deep learning models used in medical image analysis are prone to raising reliability concerns due to their black-box nature. To shed light on these black-box models, previous works predominantly focus on identifying the contribution of input features to the diagnosis, i.e., feature attribution. In this work, we explore counterfactual explanations to identify what patterns the models rely on for diagnosis. Specifically, we investigate the effect of changing features within chest X-rays on the classifier's output to understand its decision mechanism. We leverage a StyleGAN-based approach (StyleEx) to create counterfactual explanations for chest X-rays by manipulating specific latent directions in their latent space. In addition, we propose EigenFind to significantly reduce the computation time of generated explanations. We clinically evaluate the relevancy of our counterfactual explanations with the help of radiologists. Our code is publicly available.
A new experiment: Does AI really know cats or dogs -- or anything?
Most humans can recognize a cat or dog at an early age. Asked to articulate how they know if an animal is a cat or a dog, an adult might fumble for an explanation by describing experience, something such as "cats appraise you in a distant fashion, but dogs try to jump up on you and lick your face." We don't really articulate what we know, in other words. The signature achievement of artificial intelligence in the past two decades is classifying pictures of cats and dogs, among other things, by assigning them to categories. But AI programs never explain how they "know" what they supposedly "know."
Google AI Introduces 'StylEx': A New Approach For A Visual Explanation Of Classifiers
Neural networks are capable of completing a wide range of tasks. Understanding how they arrive at their decisions, on the other hand, is frequently a mystery that goes unexplored. Explaining a neural model's decision process could have a significant social impact in domains where human oversight is crucial, like medical image processing and autonomous driving. These revelations could be instrumental in advising healthcare practitioners and possibly enhancing scientific breakthroughs. For a visual explanation of classifiers, there have been approaches such as attention maps.
Explaining in Style: Training a GAN to explain a classifier in StyleSpace
Lang, Oran, Gandelsman, Yossi, Yarom, Michal, Wald, Yoav, Elidan, Gal, Hassidim, Avinatan, Freeman, William T., Isola, Phillip, Globerson, Amir, Irani, Michal, Mosseri, Inbar
Image classification models can depend on multiple different semantic attributes of the image. An explanation of the decision of the classifier needs to both discover and visualize these properties. Here we present StylEx, a method for doing this, by training a generative model to specifically explain multiple attributes that underlie classifier decisions. A natural source for such attributes is the StyleSpace of StyleGAN, which is known to generate semantically meaningful dimensions in the image. However, because standard GAN training is not dependent on the classifier, it may not represent these attributes which are important for the classifier decision, and the dimensions of StyleSpace may represent irrelevant attributes. To overcome this, we propose a training procedure for a StyleGAN, which incorporates the classifier model, in order to learn a classifier-specific StyleSpace. Explanatory attributes are then selected from this space. These can be used to visualize the effect of changing multiple attributes per image, thus providing image-specific explanations. We apply StylEx to multiple domains, including animals, leaves, faces and retinal images. For these, we show how an image can be modified in different ways to change its classifier output. Our results show that the method finds attributes that align well with semantic ones, generate meaningful image-specific explanations, and are human-interpretable as measured in user-studies.