Goto

Collaborating Authors

 paulheim


gpuRDF2vec -- Scalable GPU-based RDF2vec

arXiv.org Artificial Intelligence

Generating Knowledge Graph (KG) embeddings at web scale remains challenging. Among existing techniques, RDF2vec combines effectiveness with strong scalability. We present gpuRDF2vec, an open source library that harnesses modern GPUs and supports multi-node execution to accelerate every stage of the RDF2vec pipeline. Extensive experiments on both synthetically generated graphs and real-world benchmarks show that gpuRDF2vec achieves up to a substantial speedup over the currently fastest alternative, i.e., jRDF2vec. In a single-node setup, our walk-extraction phase alone outperforms pyRDF2vec, SparkKGML, and jRDF2vec by a substantial margin using random walks on large/ dense graphs, and scales very well to longer walks, which typically lead to better quality embeddings. Our implementation of gpuRDF2vec enables practitioners and researchers to train high-quality KG embeddings on large-scale graphs within practical time budgets and builds on top of Pytorch Lightning for the scalable word2vec implementation.


GeoRDF2Vec Learning Location-Aware Entity Representations in Knowledge Graphs

arXiv.org Artificial Intelligence

Many knowledge graphs contain a substantial number of spatial entities, such as cities, buildings, and natural landmarks. For many of these entities, exact geometries are stored within the knowledge graphs. However, most existing approaches for learning entity representations do not take these geometries into account. In this paper, we introduce a variant of RDF2Vec that incorporates geometric information to learn location-aware embeddings of entities. Our approach expands different nodes by flooding the graph from geographic nodes, ensuring that each reachable node is considered. Based on the resulting flooded graph, we apply a modified version of RDF2Vec that biases graph walks using spatial weights. Through evaluations on multiple benchmark datasets, we demonstrate that our approach outperforms both non-location-aware RDF2Vec and GeoTransE.


Dealing with Inconsistency for Reasoning over Knowledge Graphs: A Survey

arXiv.org Artificial Intelligence

In Knowledge Graphs (KGs), where the schema of the data is usually defined by particular ontologies, reasoning is a necessity to perform a range of tasks, such as retrieval of information, question answering, and the derivation of new knowledge. However, information to populate KGs is often extracted (semi-) automatically from natural language resources, or by integrating datasets that follow different semantic schemas, resulting in KG inconsistency. This, however, hinders the process of reasoning. In this survey, we focus on how to perform reasoning on inconsistent KGs, by analyzing the state of the art towards three complementary directions: a) the detection of the parts of the KG that cause the inconsistency, b) the fixing of an inconsistent KG to render it consistent, and c) the inconsistency-tolerant reasoning. We discuss existing work from a range of relevant fields focusing on how, and in which cases they are related to the above directions. We also highlight persisting challenges and future directions.


Markov Process-Based Graph Convolutional Networks for Entity Classification in Knowledge Graphs

arXiv.org Artificial Intelligence

Despite the vast amount of information encoded in Knowledge Graphs (KGs), information about the class affiliation of entities remains often incomplete. Graph Convolutional Networks (GCNs) have been shown to be effective predictors of complete information about the class affiliation of entities in KGs. However, these models do not learn the class affiliation of entities in KGs incorporating the complexity of the task, which negatively affects the models prediction capabilities. To address this problem, we introduce a Markov process-based architecture into well-known GCN architectures. This end-to-end network learns the prediction of class affiliation of entities in KGs within a Markov process. The number of computational steps is learned during training using a geometric distribution. At the same time, the loss function combines insights from the field of evidential learning. The experiments show a performance improvement over existing models in several studied architectures and datasets. Based on the chosen hyperparameters for the geometric distribution, the expected number of computation steps can be adjusted to improve efficiency and accuracy during training.


SnapE -- Training Snapshot Ensembles of Link Prediction Models

arXiv.org Artificial Intelligence

Snapshot ensembles have been widely used in various fields of prediction. They allow for training an ensemble of prediction models at the cost of training a single one. They are known to yield more robust predictions by creating a set of diverse base models. In this paper, we introduce an approach to transfer the idea of snapshot ensembles to link prediction models in knowledge graphs. Moreover, since link prediction in knowledge graphs is a setup without explicit negative examples, we propose a novel training loop that iteratively creates negative examples using previous snapshot models. An evaluation with four base models across four datasets shows that this approach constantly outperforms the single model approach, while keeping the training time constant.


CleanGraph: Human-in-the-loop Knowledge Graph Refinement and Completion

arXiv.org Artificial Intelligence

This paper presents CleanGraph, an interactive web-based tool designed to facilitate the refinement and completion of knowledge graphs. Maintaining the reliability of knowledge graphs, which are grounded in high-quality and error-free facts, is crucial for real-world applications such as question-answering and information retrieval systems. These graphs are often automatically assembled from textual sources by extracting semantic triples via information extraction. However, assuring the quality of these extracted triples, especially when dealing with large or low-quality datasets, can pose a significant challenge and adversely affect the performance of downstream applications. CleanGraph allows users to perform Create, Read, Update, and Delete (CRUD) operations on their graphs, as well as apply models in the form of plugins for graph refinement and completion tasks. These functionalities enable users to enhance the integrity and reliability of their graph data. A demonstration of CleanGraph and its source code can be accessed at https://github.com/nlp-tlp/CleanGraph under the MIT License.


The Time Traveler's Guide to Semantic Web Research: Analyzing Fictitious Research Themes in the ESWC "Next 20 Years" Track

arXiv.org Artificial Intelligence

What will Semantic Web research focus on in 20 years from now? We asked this question to the community and collected their visions in the "Next 20 years" track of ESWC 2023. We challenged the participants to submit "future" research papers, as if they were submitting to the 2043 edition of the conference. The submissions - entirely fictitious - were expected to be full scientific papers, with research questions, state of the art references, experimental results and future work, with the goal to get an idea of the research agenda for the late 2040s and early 2050s. We received ten submissions, eight of which were accepted for presentation at the conference, that mixed serious ideas of potential future research themes and discussion topics with some fun and irony. In this paper, we intend to provide a survey of those "science fiction" papers, considering the emerging research themes and topics, analysing the research methods applied by the authors in these very special submissions, and investigating also the most fictitious parts (e.g., neologisms, fabricated references). Our goal is twofold: on the one hand, we investigate what this special track tells us about the Semantic Web community and, on the other hand, we aim at getting some insights on future research practices and directions.


Gollum: A Gold Standard for Large Scale Multi Source Knowledge Graph Matching

arXiv.org Artificial Intelligence

The number of Knowledge Graphs (KGs) generated with automatic and manual approaches is constantly growing. For an integrated view and usage, an alignment between these KGs is necessary on the schema as well as instance level. While there are approaches that try to tackle this multi source knowledge graph matching problem, large gold standards are missing to evaluate their effectiveness and scalability. We close this gap by presenting Gollum -- a gold standard for large-scale multi source knowledge graph matching with over 275,000 correspondences between 4,149 different KGs. They originate from knowledge graphs derived by applying the DBpedia extraction framework to a large wiki farm. Three variations of the gold standard are made available: (1) a version with all correspondences for evaluating unsupervised matching approaches, and two versions for evaluating supervised matching: (2) one where each KG is contained both in the train and test set, and (3) one where each KG is exclusively contained in the train or the test set.


Entity Type Prediction Leveraging Graph Walks and Entity Descriptions

arXiv.org Artificial Intelligence

The entity type information in Knowledge Graphs (KGs) such as DBpedia, Freebase, etc. is often incomplete due to automated generation or human curation. Entity typing is the task of assigning or inferring the semantic type of an entity in a KG. This paper presents \textit{GRAND}, a novel approach for entity typing leveraging different graph walk strategies in RDF2vec together with textual entity descriptions. RDF2vec first generates graph walks and then uses a language model to obtain embeddings for each node in the graph. This study shows that the walk generation strategy and the embedding model have a significant effect on the performance of the entity typing task. The proposed approach outperforms the baseline approaches on the benchmark datasets DBpedia and FIGER for entity typing in KGs for both fine-grained and coarse-grained classes. The results show that the combination of order-aware RDF2vec variants together with the contextual embeddings of the textual entity descriptions achieve the best results.


The DLCC Node Classification Benchmark for Analyzing Knowledge Graph Embeddings

arXiv.org Artificial Intelligence

Knowledge graph embedding is a representation learning technique that projects entities and relations in a knowledge graph to continuous vector spaces. Embeddings have gained a lot of uptake and have been heavily used in link prediction and other downstream prediction tasks. Most approaches are evaluated on a single task or a single group of tasks to determine their overall performance. The evaluation is then assessed in terms of how well the embedding approach performs on the task at hand. Still, it is hardly evaluated (and often not even deeply understood) what information the embedding approaches are actually learning to represent. To fill this gap, we present the DLCC (Description Logic Class Constructors) benchmark, a resource to analyze embedding approaches in terms of which kinds of classes they can represent. Two gold standards are presented, one based on the real-world knowledge graph DBpedia and one synthetic gold standard. In addition, an evaluation framework is provided that implements an experiment protocol so that researchers can directly use the gold standard. To demonstrate the use of DLCC, we compare multiple embedding approaches using the gold standards. We find that many DL constructors on DBpedia are actually learned by recognizing different correlated patterns than those defined in the gold standard and that specific DL constructors, such as cardinality constraints, are particularly hard to be learned for most embedding approaches.