Explainable Benchmarking through the Lense of Concept Learning

Zhang, Quannian, Röder, Michael, Srivastava, Nikit, Kouagou, N'Dah Jean, Ngomo, Axel-Cyrille Ngonga

Oct-24-2025–arXiv.org Artificial Intelligence

Evaluating competing systems in a comparable way, i.e., benchmarking them, is an undeniable pillar of the scientific method. However, system performance is often summarized via a small number of metrics. The analysis of the evaluation details and the derivation of insights for further development or use remains a tedious manual task with often biased results. Thus, this paper argues for a new type of benchmarking, which is dubbed explainable benchmarking. The aim of explainable benchmarking approaches is to automatically generate explanations for the performance of systems in a benchmark. We provide a first instantiation of this paradigm for knowledge-graph-based question answering systems. We compute explanations by using a novel concept learning approach developed for large knowledge graphs called PruneCEL. Our evaluation shows that PruneCEL outperforms state-of-the-art concept learners on the task of explainable benchmarking by up to 0.55 points F1 measure. A task-driven user study with 41 participants shows that in 80\% of the cases, the majority of participants can accurately predict the behavior of a system based on our explanations. Our code and data are available at https://github.com/dice-group/PruneCEL/tree/K-cap2025

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Oct-24-2025

arXiv.org PDF

Add feedback

Country:
- Europe (0.69)
- North America > United States (0.30)

Genre:
- Research Report > Promising Solution (0.86)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning (1.00)
  - Representation & Reasoning > Ontologies (0.94)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found