The Bouncer Problem: Challenges to Remote Explainability

Merrer, Erwan Le, Tredan, Gilles

arXiv.org Machine Learning 

The concept of explainability is envisioned to satisfy society's demands for transparency on machine learning decisions. The concept is simple: like humans, algorithms should explain the rationale behind their decisions so that their fairness can be assessed. While this approach is promising in a local context (e.g. to explain a model during debugging at training time), we argue that this reasoning cannot simply be transposed in a remote context, where a trained model by a service provider is only accessible through its API. This is problematic as it constitutes precisely the target use-case requiring transparency from a societal perspective. Through an analogy with a club bouncer (which may provide untruthful explanations upon customer reject), we show that providing explanations cannot prevent a remote service from lying about the true reasons leading to its decisions. More precisely, we prove the impossibility of remote explainability for single explanations, by constructing an attack on explanations that hides discriminatory features to the querying user. We provide an example implementation of this attack. We then show that the probability that an observer spots the attack, using several explanations for attempting to find incoherences, is low in practical settings. This undermines the very concept of remote ex-plainability in general. 1 Introduction Modern decision-making driven by black-box systems now impacts a significant share of our lives [9, 29]. Those systems build on user data, and range from rec-ommenders [21] ( e.g., for personalized ranking of information on websites) to predictive algorithms ( e.g., credit default) [29]. This widespread deployment, along with the opaque decision process provided by those systems raises concerns about transparency for the general public or for policy makers [12]. This translated in some jurisdictions ( e.g., United States of America and Europe) into a so called right to explanation [12, 26], that states that the output decisions of an algorithm must be motivated. Explainability of in-house models An already large body of work is interested in the explainability of implicit machine learning models (such as neural network models) [2, 13, 20]. Indeed, those models show state-of-art performances when it comes to a task accuracy, but they are not designed to provide explanations -or at least intelligible decision processes-when one wants to obtain more than the output decision of the model. In the context of recommendation, the expression "post hoc explanation" has been coined [32].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found