X Hacking: The Threat of Misguided AutoML

Sharma, Rahul, Redyuk, Sergey, Mukherjee, Sumantrak, Sipka, Andrea, Vollmer, Sebastian, Selby, David

arXiv.org Artificial Intelligence 

Machine learning models are increasingly used to make decisions that affect human lives, society and the environment, in areas such as medical diagnosis, criminal justice and public policy. However, these models are often complex and opaque--especially with the increasing ubiquity of deep learning and generative AI--making it difficult to understand how and why they produce certain predictions. Explainable AI (XAI) is a field of research that aims to provide interpretable and transparent explanations for the outputs of machine learning models. The growing demand for model interpretability, along with a trend for'data-driven' decisions, has the unexpected side-effect of creating an increased incentive for abuse and manipulation. Data analysts may have a vested interest or be pressured to present a certain explanation for a model's predictions, whether to confirm a pre-specified conclusion, to conceal a hidden agenda, or to avoid ethical scrutiny. In this paper, we introduce the concept of explanation hacking or X-hacking, a form of p-hacking applied to XAI metrics. X-hacking refers to the practice of deliberately searching for and selecting models that produce a desired explanation while maintaining'acceptable' predictive performance, according to some benchmark. Unlike fairwashing attacks, X-hacking does not involve manipulating the model architecture or its explanations; rather it explores plausible combinations of analysis decisions.