conditional feature importance
Conditional Feature Importance with Generative Modeling Using Adversarial Random Forests
Blesch, Kristin, Koenen, Niklas, Kapar, Jan, Golchian, Pegah, Burk, Lukas, Loecher, Markus, Wright, Marvin N.
Explainable artificial intelligence (XAI) aims to shed light on the opaque behavior of machine learning algorithms, which includes assessing the importance of features for a predictive algorithm. Model-agnostic post hoc methods attribute scores to input features according to their relevance for the prediction in an arbitrary, already fitted supervised machine learning model (Molnar, 2020; Murdoch et al., 2019). Refined conceptualizations include, for example, methods aiming for insights on the prediction of individual observations, like Shapley additive explanations (Lundberg and Lee, 2017), or a feature importance focus on the model's overall behavior, yielding global-level explanations. A crucial distinction in feature importance concepts is between conditional and marginal viewpoints (Strobl et al., 2008; Watson and Wright, 2021): Marginal feature importance evaluates a feature's impact irrespective of other features included in the model, whereas conditional feature importance takes the predictive information of other features into account. The presence of dependency structures, which real-world datasets frequently exhibit, plays a pivotal role in this distinction because a feature's impact on the prediction given, i.e., on top of the predictive information provided by correlated features, alters the importance score attributed (Watson and Wright, 2021).
Global Censored Quantile Random Forest
In recent years, censored quantile regression has enjoyed an increasing popularity for survival analysis while many existing works rely on linearity assumptions. In this work, we propose a Global Censored Quantile Random Forest (GCQRF) for predicting a conditional quantile process on data subject to right censoring, a forest-based flexible, competitive method able to capture complex nonlinear relationships. Taking into account the randomness in trees and connecting the proposed method to a randomized incomplete infinite degree U-process (IDUP), we quantify the prediction process' variation without assuming an infinite forest and establish its weak convergence. Moreover, feature importance ranking measures based on out-of-sample predictive accuracy are proposed. We demonstrate the superior predictive accuracy of the proposed method over a number of existing alternatives and illustrate the use of the proposed importance ranking measures on both simulated and real data.
Conditional Feature Importance for Mixed Data
Blesch, Kristin, Watson, David S., Wright, Marvin N.
Despite the popularity of feature importance (FI) measures in interpretable machine learning, the statistical adequacy of these methods is rarely discussed. From a statistical perspective, a major distinction is between analyzing a variable's importance before and after adjusting for covariates - i.e., between $\textit{marginal}$ and $\textit{conditional}$ measures. Our work draws attention to this rarely acknowledged, yet crucial distinction and showcases its implications. Further, we reveal that for testing conditional FI, only few methods are available and practitioners have hitherto been severely restricted in method application due to mismatching data requirements. Most real-world data exhibits complex feature dependencies and incorporates both continuous and categorical data (mixed data). Both properties are oftentimes neglected by conditional FI measures. To fill this gap, we propose to combine the conditional predictive impact (CPI) framework with sequential knockoff sampling. The CPI enables conditional FI measurement that controls for any feature dependencies by sampling valid knockoffs - hence, generating synthetic data with similar statistical properties - for the data to be analyzed. Sequential knockoffs were deliberately designed to handle mixed data and thus allow us to extend the CPI approach to such datasets. We demonstrate through numerous simulations and a real-world example that our proposed workflow controls type I error, achieves high power and is in line with results given by other conditional FI measures, whereas marginal FI metrics result in misleading interpretations. Our findings highlight the necessity of developing statistically adequate, specialized methods for mixed data.