Fooling Partial Dependence via Data Poisoning
Explainable machine learning gives many promises for developers and auditors working with black-box predictive models. Alarmingly, recent studies show that many explanations are not trustworthy and can be manipulated in an adversarial manner. Hence, it is necessary to focus on evaluating post-hoc explainability the same way we critically assume to evaluate model performance. In the paper, we present techniques for attacking Partial Dependence (plots, profiles, PDP), which are among the most popular methods of explaining any predictive model trained on tabular data. This is especially crucial in financial or medical applications where auditability became a must-have trait supporting decisions made by black-boxes.
Aug-9-2021, 03:15:25 GMT