Explainable Benchmarking through the Lense of Concept Learning
Zhang, Quannian, Röder, Michael, Srivastava, Nikit, Kouagou, N'Dah Jean, Ngomo, Axel-Cyrille Ngonga
–arXiv.org Artificial Intelligence
Evaluating competing systems in a comparable way, i.e., benchmarking them, is an undeniable pillar of the scientific method. However, system performance is often summarized via a small number of metrics. The analysis of the evaluation details and the derivation of insights for further development or use remains a tedious manual task with often biased results. Thus, this paper argues for a new type of benchmarking, which is dubbed explainable benchmarking. The aim of explainable benchmarking approaches is to automatically generate explanations for the performance of systems in a benchmark. We provide a first instantiation of this paradigm for knowledge-graph-based question answering systems. We compute explanations by using a novel concept learning approach developed for large knowledge graphs called PruneCEL. Our evaluation shows that PruneCEL outperforms state-of-the-art concept learners on the task of explainable benchmarking by up to 0.55 points F1 measure. A task-driven user study with 41 participants shows that in 80\% of the cases, the majority of participants can accurately predict the behavior of a system based on our explanations. Our code and data are available at https://github.com/dice-group/PruneCEL/tree/K-cap2025
arXiv.org Artificial Intelligence
Oct-24-2025
- Country:
- Asia > Indonesia (0.04)
- Europe
- Germany > North Rhine-Westphalia (0.05)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- North America > United States
- New York > New York County
- New York City (0.04)
- Ohio > Montgomery County
- Dayton (0.05)
- New York > New York County
- Genre:
- Research Report > Promising Solution (0.86)
- Technology: