AITopics | Mpala, Proud

Collaborating Authors

Mpala, Proud

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

KGGen: Extracting Knowledge Graphs from Plain Text with Language Models

Mo, Belinda, Yu, Kyssen, Kazdan, Joshua, Mpala, Proud, Yu, Lisa, Cundy, Chris, Kanatsoulis, Charilaos, Koyejo, Sanmi

arXiv.org Artificial IntelligenceFeb-14-2025

Recent interest in building foundation models for KGs has highlighted a fundamental challenge: knowledge-graph data is relatively scarce. The best-known KGs are primarily human-labeled, created by pattern-matching, or extracted using early NLP techniques. While human-generated KGs are in short supply, automatically extracted KGs are of questionable quality. We present a solution to this data scarcity problem in the form of a text-to-KG generator (KGGen), a package that uses language models to create high-quality graphs from plaintext. Unlike other KG extractors, KGGen clusters related entities to reduce sparsity in extracted KGs. KGGen is available as a Python library ( pip install kg-gen), making it accessible to everyone. Along with KGGen, we release the first benchmark, Measure of of Information in Nodes and Edges (MINE), that tests an extractor's ability to produce a useful KG from plain text. We benchmark our new tool against existing extractors and demonstrate far superior performance. Knowledge graph (KG) applications and Graph Retrieval-Augmented Generation (RAG) systems are increasingly bottlenecked by the scarcity and incompleteness of available KGs. KGs consist of a set of subject-predicate-object triples, and have become a fundamental data structure for information retrieval (Schneider, 1973). Most real-world KGs, including Wikidata (contributors, 2024), DBpedia (Lehmann et al., 2015), and Y AGO (Suchanek et al., 2007), are far from complete, with many missing relations between entities (Shenoy et al., 2021).

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2502.09956

Country:

Europe (0.68)
North America > United States (0.28)

Genre: Research Report (0.82)

Industry:

Banking & Finance > Trading (0.48)
Banking & Finance > Economy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback