Space Explanations of Neural Network Classification

Labbaf, Faezeh, Kolárik, Tomáš, Blicha, Martin, Fedyukovich, Grigory, Wand, Michael, Sharygina, Natasha

arXiv.org Artificial Intelligence 

Explainability of decision-making AI systems (XAI), and specifically neural networks (NNs), is a key requirement for deploying AI in sensitive areas [18]. A recent trend in explaining NNs is based on formal methods and logic, providing explanations for the decisions of machine learning systems [24, 31, 32, 41, 42, 44] accompanied by provable guarantees regarding their correctness. Yet, rigorous exploration of the continuous feature space requires to estimate decision boundaries with complex shapes. This, however, remains a challenge because existing explanations [24, 31, 32, 41, 42, 44] constrain only individual features and hence fail capturing relationships among the features that are essential to understand the reasons behind the multi-parametrized classification process. We address the need to provide interpretations of NN systems that are as meaningful as possible using a novel concept of Space Explanations, delivered by a flexible symbolic reasoning framework where Craig interpolation [12] is at the heart of the machinery.