KGGen: Extracting Knowledge Graphs from Plain Text with Language Models

Jun-15-2026, 21:40:37 GMT–Neural Information Processing Systems

Recent interest in building foundation models for knowledge graphs has highlighted a fundamental challenge: knowledge graph data is scarce. The best-known knowledge graphs are primarily human-labeled, created by pattern-matching, or extracted using early NLP techniques. While human-generated knowledge graphs are in short supply, automatically extracted ones are of questionable quality. We present KGGen, a novel text-to-knowledge-graph generator that uses language models to extract high-quality graphs from plain text with a novel entity resolution approach that clusters related entities, significantly reducing the sparsity problem that plagues existing extractors. Unlike other KG generators, KGGen clusters and de-duplicates related entities to reduce sparsity in extracted KGs. Along with KGGen, we release Measure of Information in Nodes and Edges (MINE), the first benchmark to test an extractor's ability to produce a useful KG from plain text. We benchmark our new tool against leading existing generators such as Microsoft's GraphRAG; we achieve comparable retrieval accuracy on the generated graphs and better information retention.

large language model, machine learning, natural language, (23 more...)

Neural Information Processing Systems

Jun-15-2026, 21:40:37 GMT

Conferences PDF

Add feedback

Country:
- Europe (1.00)
- North America > United States (0.46)

Genre:
- Research Report > Experimental Study (1.00)

Industry:
- Banking & Finance > Economy (0.93)
- Information Technology (0.68)
- Health & Medicine > Therapeutic Area (0.66)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Semantic Networks (1.00)
  - Natural Language
    - Text Processing (1.00)
    - Large Language Model (0.96)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found