Semantic-KG: Using Knowledge Graphs to Construct Benchmarks for Measuring Semantic Similarity

Jun-15-2026, 18:30:34 GMT–Neural Information Processing Systems

Evaluating the open-form textual responses generated by Large Language Models (LLMs) typically requires measuring the semantic similarity of the response to a (human generated) reference. However, there is evidence that current semantic similarity methods may capture syntactic or lexical forms over semantic content. While benchmarks exist for semantic equivalence, they often suffer from high generation costs due to reliance on subjective human judgment, limited availability for domain-specific applications, and unclear definitions of equivalence. This paper introduces a novel method for generating benchmarks to evaluate semantic similarity methods for LLM outputs, specifically addressing these limitations. Our approach leverages knowledge graphs (KGs) to generate pairs of naturallanguage statements that are semantically similar or dissimilar, with dissimilar pairs categorized into one of four sub-types. We generate benchmark datasets in four different domains (general knowledge, biomedicine, finance, biology), and conduct a comparative study of semantic similarity methods including traditional natural language processing scores and LLM-as-a-judge predictions. We observe that the sub-type of semantic variation, as well as the domain of the benchmark impact the performance of semantic similarity methods, with no method being consistently superior.

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Jun-15-2026, 18:30:34 GMT

Conferences PDF

Add feedback

Country:
- Asia (1.00)
- North America > United States
  - New York (0.28)
- Europe
  - Switzerland (0.67)
  - United Kingdom > England (0.27)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Media (1.00)
- Banking & Finance > Economy (0.92)
- Health & Medicine > Therapeutic Area (0.92)
- Leisure & Entertainment (0.92)
- Law (0.67)
- Government > Regional Government
  - North America Government (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Text Processing (1.00)
    - Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found