Removing Spurious Concepts from Neural Network Representations via Joint Subspace Estimation

Holstege, Floris, Wouters, Bram, van Giersbergen, Noud, Diks, Cees

arXiv.org Machine Learning 

This crucially differs from existing methods, which only focus on the spurious concept features, risking the loss of vital main-task information. Furthermore, we make the identification of the subspaces systematic by introducing statistical tests that attribute directions in the embedding space to either the main-task or the spurious concept. The method, which we call Joint Subspace Estimation (JSE), is shown to be robust against the strength of the spurious correlation and to outperform existing concept-removal methods for a Toy dataset as well as benchmark datasets for image recognition (Waterbirds, CelebA) and natural language processing (MultiNLI). A high-level overview of the method is given in Figure 1. Figure 1: High-level overview of Joint Subspace Estimation (JSE) for concept removal: the input x is fed through a neural network f(x), from which we can extract the vector representation z. Within the vector representation, two orthogonal subspaces are identified: one related to the spurious concept (the background), and one to the main-task concept (bird type). JSE estimates the subspaces of the two concepts simultaneously to prevent mixing of spurious and main-task features.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found