ProVe: A Pipeline for Automated Provenance Verification of Knowledge Graphs against Textual Sources

Amaral, Gabriel, Rodrigues, Odinaldo, Simperl, Elena

Oct-26-2022–arXiv.org Artificial Intelligence

A Knowledge Graph (KG) is a type of knowledge base that stores information in the form of semantic triples formed by a subject, a predicate, and an object. KGs represent both real and abstract entities internally as labelled and uniquely identifiable entities, such as The Moon or Happiness, and can amass information from a multitude of domains and sources by connecting such entities amongst themselves or to literals through relationships, coded via uniquely identified predicates. KGs serve as sources of both human and machine-readable semantically structured data for various crucial applications in the modern web landscape, such as Wikipedia infoboxes, search engines results, voice-activated assistants, and information gathering projects [30]. Developed and maintained by ontology experts, data curators, and even anonymous volunteers, KGs have massively grown in size and adoption in the last decade, mainly as secondary sources of information. This means not storing new information, but taking it from authoritative and reliable sources which are explicitly referenced. As such, KGs depend on well-documented and verifiable provenance to ensure they are regarded as trustworthy and usable [56]. Processes to assess and assure the quality of information provenance are thus crucial to KGs, especially measuring and maintaining verifiability, i.e. the degree to which consumers of KG triples can attest these are truly supported by their sources [56]. However, such processes are currently performed mostly manually, which does not scale with size. Manually ensuring high verifiability on vital KGs such as Wikidata and DBpedia is prohibitive due to their sheer size.

annotation, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Oct-26-2022

arXiv.org PDF

Add feedback

Country:
- North America
  - United States (0.67)
  - Canada > Ontario
    - Middlesex County > London (0.04)
- Europe
  - Russia (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - Russia (0.14)
  - Vietnam (0.04)
  - India > Himachal Pradesh (0.04)
  - Middle East > Iran
    - Tehran Province > Tehran (0.04)

Genre:
- Research Report (1.00)
- Overview (1.00)

Industry:
- Government > Regional Government (0.93)
- Consumer Products & Services (0.67)
- Media (0.67)
- Leisure & Entertainment (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Text Processing (1.00)
  - Representation & Reasoning
    - Ontologies (0.87)
    - Expert Systems (0.87)
    - Semantic Networks (0.62)
  - Machine Learning > Performance Analysis
    - Accuracy (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found