XAI-FUNGI: Dataset resulting from the user study on comprehensibility of explainable AI algorithms

Bobek, Szymon, Korycińska, Paloma, Krakowska, Monika, Mozolewski, Maciej, Rak, Dorota, Zych, Magdalena, Wójcik, Magdalena, Nalepa, Grzegorz J.

arXiv.org Artificial Intelligence 

With the rapid development of black-box machine learning (ML) models, such as deep neural networks or gradient boosting trees, the need for explanations of their decisions has emerged. This demand has been driven by the increasing implementation of opaque models, in high-risk and critical areas like medicine, healthcare, industry, and law, which laid the foundation for modern research on explainable and interpretable artificial intelligence (XAI). Scientists' efforts in designing XAI algorithms have been further supported by political initiatives such as DARPA's XAI challenge [1], the European Union's GDPR [2], and more recently, the EU AI Act [3]. The shared goal of all these initiatives is to improve the transparency of AI systems, thereby promoting their adoption in areas where trust in AI is not fully established or where the transparency of decisions is crucial for legal and safety reasons. However, as XAI algorithms have been advanced, a new discussion has been initiated, addressing the fundamental challenge of ensuring that the explanations generated by these algorithms are comprehensible to humans. This triggered research on the evaluation of XAI [4], drawing attention from social sciences, which argued that much of the effort in XAI relies solely on researchers' intuition about what constitutes a good explanation. They emphasized that human factors should be integral to the design and evaluation of XAI to ensure its reliability [5]. Recognizing individual human abilities to comprehend algorithmically generated explanations is crucial, as these abilities can vary significantly based on personal information competencies. Additionally, there is a lack of established multidisciplinary methods for measuring these capabilities, as well as datasets that facilitate reproducible evaluations or comprehensive analyses.