Trustworthy Feature Importance Avoids Unrestricted Permutations

Borgonovo, Emanuele, Cappelli, Francesco, Lu, Xuefei, Plischke, Elmar, Rudin, Cynthia

arXiv.org Machine Learning 

Since their introduction by Breiman (2001), permutation-based feature importance measures have been widely adopted. However, randomly permuting the entries of a dataset may create new points far from the original data or even "impossible data." In a permuted dataset, we may find children who are retired or individuals who graduated from high school before they were born (Mase et al. 2022, p. 1). Forcing ML models to make predictions at these points causes them to extrapolate, making explanations unreliable (Hooker et al. 2021). Every non-trivial permutation-based variable importance measure, including SHAP (Lundberg and Lee 2017), Knockoffs (Barber and Candés 2015), conditional model reliance (Fisher et al. 2019), and accumulated local effect (ALE) plots (Apley and Zhu 2020) suffer from this. We propose and compare three new strategies to address extrapolation issues. The first combines conditional model reliance from Fisher et al. (2019) with a Gaussian transformation. By mapping data quantiles to a Gaussian distribution and back, we adjust only the quantiles of point values, significantly reducing extrapolation. Under a Gaussian copula assumption for the feature distribution, we prove that the new data points follow the same probability distribution as the original data.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found