How Sharp and Bias-Robust is a Model? Dual Evaluation Perspectives on Knowledge Graph Completion
–arXiv.org Artificial Intelligence
Knowledge graph completion (KGC) aims to predict missing facts from the observed KG. While a number of KGC models have been studied, the evaluation of KGC still remain underexplored. In this paper, we observe that existing metrics overlook two key perspectives for KGC evaluation: (A1) predictive sharpness -- the degree of strictness in evaluating an individual prediction, and (A2) popularity-bias robustness -- the ability to predict low-popularity entities. Toward reflecting both perspectives, we propose a novel evaluation framework (PROBE), which consists of a rank transformer (RT) estimating the score of each prediction based on a required level of predictive sharpness and a rank aggregator (RA) aggregating all the scores in a popularity-aware manner. Experiments on real-world KGs reveal that existing metrics tend to over- or under-estimate the accuracy of KGC models, whereas PROBE yields a comprehensive understanding of KGC models and reliable evaluation results.
arXiv.org Artificial Intelligence
Dec-9-2025
- Country:
- Asia
- China > Hong Kong (0.04)
- South Korea > Seoul
- Seoul (0.04)
- Europe > Belgium
- Brussels-Capital Region > Brussels (0.04)
- North America > United States
- Idaho > Ada County
- Boise (0.05)
- New York > New York County
- New York City (0.04)
- Texas > Travis County
- Austin (0.04)
- Idaho > Ada County
- Asia
- Genre:
- Research Report (1.00)
- Industry:
- Health & Medicine (0.47)
- Technology: