Cooperative effects in feature importance of individual patterns: application to air pollutants and Alzheimer disease

Ontivero-Ortega, M., Fania, A., Lacalamita, A., Bellotti, R., Monaco, A., Stramaglia, S.

arXiv.org Artificial Intelligence 

In [1] a novel global feature importance method for regression has been introduced for explainable artificial intelligence (XAI) [2], based on recent results which generalize the traditional dyadic description of networks of variables to the higher-order setting [3, 4]. Notably, an increasing attention is being devoted to the emergent properties of complex systems, with a prominent role in this literature played by partial information decomposition (PID) [5] and its subsequent developments [6], exploiting information-theoretic tools to reveal high-order dependencies among groups of three or more random variables and describe their synergistic or redundant nature [7-11]. Within this framework, redundancy refers to information retrievable from multiple sources, while synergy refers to statistical relationships existing within the whole system that cannot be observed in its individual parts. The approach described in [1], named Hi-Fi (high-order interactions for feature importance), is rooted on a well known metric of feature importance named Leave-One-Out Covariates (LOCO) [12], i.e. the reduction of the prediction error when the feature under consideration is added to the set of all the features used for regression, and proposes an adaptive version of LOCO which provides three scores for each feature: the unique pure standalone (two-body) influence of the feature on the target, and the contributions stemming from synergistic and redundant interactions with other features. It is worth mentioning that the decomposition of feature importance in [1] clearly depends also on the choice of the hypothesis space for regression, hence it should be assumed that a proper model for data has been selected.