Ultra-marginal Feature Importance: Learning from Data with Causal Guarantees

Janssen, Joseph, Guan, Vincent, Robeva, Elina

arXiv.org Artificial Intelligence 

Recently, feature importance methods such as Shapley values (Shapley, 1953; Cohen et al., 2007; Lundberg and Lee, 2017), Shapley additive global importance (SAGE) (Covert Scientists frequently prioritize learning from data et al., 2020), accumulated local effects (ALE) (Apley and rather than training the best possible model; however, Zhu, 2020), permutation importance (PI) (Breiman, 2001), research in machine learning often prioritizes and conditional permutation importance (CPI) (Debeer and the latter. Marginal contribution feature importance Strobl, 2020), have been used in high-impact journal papers (MCI) was developed to break this trend by scientists who want to explain the mechanisms behind by providing a useful framework for quantifying observational data (Addor et al., 2018; Bazaga et al., 2020; the relationships in data. In this work, we aim to Stein et al., 2021; Johnsen et al., 2021; Schmidt et al., 2020; improve upon the theoretical properties, performance, Gill et al., 2017; Janssen et al., 2022). However, these and runtime of MCI by introducing ultramarginal methods are predominantly for model explanation or feature feature importance (UMFI), which uses selection, so they have many shortcomings when used dependence removal techniques from the AI fairness for other purposes such as scientific inference (Freiesleben literature as its foundation. We first propose et al., 2022; Catav et al., 2021). ALE can nicely display axioms for feature importance methods that how changes in inputs lead to altered model predictions but seek to explain the causal and associative relationships important higher order effects are omitted (Molnar, 2020), in data, and we prove that UMFI satisfies and although CPI improves upon some limitations of PI, these axioms under basic assumptions. We CPI gives zero importance to perfectly correlated features then show on real and simulated data that UMFI even if they offer significant explanatory power towards performs better than MCI, especially in the presence the response (Covert et al., 2020). Similarly, Shapley values of correlated interactions and unrelated features, diminish the importance of duplicated or highly correlated while partially learning the structure of the features (Catav et al., 2021). Further, only one model causal graph and reducing the exponential runtime is trained in ALE, CPI, and PI.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found