Portisch, Jan
Entity Type Prediction Leveraging Graph Walks and Entity Descriptions
Biswas, Russa, Portisch, Jan, Paulheim, Heiko, Sack, Harald, Alam, Mehwish
The entity type information in Knowledge Graphs (KGs) such as DBpedia, Freebase, etc. is often incomplete due to automated generation or human curation. Entity typing is the task of assigning or inferring the semantic type of an entity in a KG. This paper presents \textit{GRAND}, a novel approach for entity typing leveraging different graph walk strategies in RDF2vec together with textual entity descriptions. RDF2vec first generates graph walks and then uses a language model to obtain embeddings for each node in the graph. This study shows that the walk generation strategy and the embedding model have a significant effect on the performance of the entity typing task. The proposed approach outperforms the baseline approaches on the benchmark datasets DBpedia and FIGER for entity typing in KGs for both fine-grained and coarse-grained classes. The results show that the combination of order-aware RDF2vec variants together with the contextual embeddings of the textual entity descriptions achieve the best results.
The DLCC Node Classification Benchmark for Analyzing Knowledge Graph Embeddings
Portisch, Jan, Paulheim, Heiko
Knowledge graph embedding is a representation learning technique that projects entities and relations in a knowledge graph to continuous vector spaces. Embeddings have gained a lot of uptake and have been heavily used in link prediction and other downstream prediction tasks. Most approaches are evaluated on a single task or a single group of tasks to determine their overall performance. The evaluation is then assessed in terms of how well the embedding approach performs on the task at hand. Still, it is hardly evaluated (and often not even deeply understood) what information the embedding approaches are actually learning to represent. To fill this gap, we present the DLCC (Description Logic Class Constructors) benchmark, a resource to analyze embedding approaches in terms of which kinds of classes they can represent. Two gold standards are presented, one based on the real-world knowledge graph DBpedia and one synthetic gold standard. In addition, an evaluation framework is provided that implements an experiment protocol so that researchers can directly use the gold standard. To demonstrate the use of DLCC, we compare multiple embedding approaches using the gold standards. We find that many DL constructors on DBpedia are actually learned by recognizing different correlated patterns than those defined in the gold standard and that specific DL constructors, such as cardinality constraints, are particularly hard to be learned for most embedding approaches.
Matching with Transformers in MELT
Hertling, Sven, Portisch, Jan, Paulheim, Heiko
One of the strongest signals for automated matching of ontologies and knowledge graphs are the textual descriptions of the concepts. The methods that are typically applied (such as character- or token-based comparisons) are relatively simple, and therefore do not capture the actual meaning of the texts. With the rise of transformer-based language models, text comparison based on meaning (rather than lexical features) is possible. In this paper, we model the ontology matching task as classification problem and present approaches based on transformer models. We further provide an easy to use implementation in the MELT framework which is suited for ontology and knowledge graph matching. We show that a transformer-based filter helps to choose the correct correspondences given a high-recall alignment and already achieves a good result with simple alignment post-processing methods.
Putting RDF2vec in Order
Portisch, Jan, Paulheim, Heiko
The RDF2vec method for creating node embeddings on knowledge graphs is based on word2vec, which, in turn, is agnostic towards the position of context words. In this paper, we argue that this might be a shortcoming when training RDF2vec, and show that using a word2vec variant which respects order yields considerable performance gains especially on tasks where entities of different classes are involved.
Background Knowledge in Schema Matching: Strategy vs. Data
Portisch, Jan, Hladik, Michael, Paulheim, Heiko
The use of external background knowledge can be beneficial for the task of matching schemas or ontologies automatically. In this paper, we exploit six general-purpose knowledge graphs as sources of background knowledge for the matching task. The background sources are evaluated by applying three different exploitation strategies. We find that explicit strategies still outperform latent ones and that the choice of the strategy has a greater impact on the final alignment than the actual background dataset on which the strategy is applied. While we could not identify a universally superior resource, BabelNet achieved consistently good results. Our best matcher configuration with BabelNet performs very competitively when compared to other matching systems even though no dataset-specific optimizations were made.
Knowledge Graphs Evolution and Preservation -- A Technical Report from ISWS 2019
Abbas, Nacira, Alghamdi, Kholoud, Alinam, Mortaza, Alloatti, Francesca, Amaral, Glenda, d'Amato, Claudia, Asprino, Luigi, Beno, Martin, Bensmann, Felix, Biswas, Russa, Cai, Ling, Capshaw, Riley, Carriero, Valentina Anita, Celino, Irene, Dadoun, Amine, De Giorgis, Stefano, Delva, Harm, Domingue, John, Dumontier, Michel, Emonet, Vincent, van Erp, Marieke, Arias, Paola Espinoza, Fallatah, Omaima, Ferrada, Sebastiรกn, Ocaรฑa, Marc Gallofrรฉ, Georgiou, Michalis, Gesese, Genet Asefa, Gillis-Webber, Frances, Giovannetti, Francesca, Buey, Marรฌa Granados, Harrando, Ismail, Heibi, Ivan, Horta, Vitor, Huber, Laurine, Igne, Federico, Jaradeh, Mohamad Yaser, Keshan, Neha, Koleva, Aneta, Koteich, Bilal, Kurniawan, Kabul, Liu, Mengya, Ma, Chuangtao, Maas, Lientje, Mansfield, Martin, Mariani, Fabio, Marzi, Eleonora, Mesbah, Sepideh, Mistry, Maheshkumar, Tirado, Alba Catalina Morales, Nguyen, Anna, Nguyen, Viet Bach, Oelen, Allard, Pasqual, Valentina, Paulheim, Heiko, Polleres, Axel, Porena, Margherita, Portisch, Jan, Presutti, Valentina, Pustu-Iren, Kader, Mendez, Ariam Rivas, Roshankish, Soheil, Rudolph, Sebastian, Sack, Harald, Sakor, Ahmad, Salas, Jaime, Schleider, Thomas, Shi, Meilin, Spinaci, Gianmarco, Sun, Chang, Tietz, Tabea, Dhouib, Molka Tounsi, Umbrico, Alessandro, Berg, Wouter van den, Xu, Weiqin
One of the grand challenges discussed during the Dagstuhl Seminar "Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web" and described in its report is that of a: "Public FAIR Knowledge Graph of Everything: We increasingly see the creation of knowledge graphs that capture information about the entirety of a class of entities. [...] This grand challenge extends this further by asking if we can create a knowledge graph of "everything" ranging from common sense concepts to location based entities. This knowledge graph should be "open to the public" in a FAIR manner democratizing this mass amount of knowledge." Although linked open data (LOD) is one knowledge graph, it is the closest realisation (and probably the only one) to a public FAIR Knowledge Graph (KG) of everything. Surely, LOD provides a unique testbed for experimenting and evaluating research hypotheses on open and FAIR KG. One of the most neglected FAIR issues about KGs is their ongoing evolution and long term preservation. We want to investigate this problem, that is to understand what preserving and supporting the evolution of KGs means and how these problems can be addressed. Clearly, the problem can be approached from different perspectives and may require the development of different approaches, including new theories, ontologies, metrics, strategies, procedures, etc. This document reports a collaborative effort performed by 9 teams of students, each guided by a senior researcher as their mentor, attending the International Semantic Web Research School (ISWS 2019). Each team provides a different perspective to the problem of knowledge graph evolution substantiated by a set of research questions as the main subject of their investigation. In addition, they provide their working definition for KG preservation and evolution.
RDF2Vec Light -- A Lightweight Approach for Knowledge Graph Embeddings
Portisch, Jan, Hladik, Michael, Paulheim, Heiko
Knowledge graph embedding approaches represent nodes and edges of graphs as mathematical vectors. Current approaches focus on embedding complete knowledge graphs, i.e. all nodes and edges. This leads to very high computational requirements on large graphs such as DBpedia or Wikidata. However, for most downstream application scenarios, only a small subset of concepts is of actual interest. In this paper, we present RDF2Vec Light, a lightweight embedding approach based on RDF2Vec which generates vectors for only a subset of entities. To that end, RDF2Vec Light only traverses and processes a subgraph of the knowledge graph. Our method allows the application of embeddings of very large knowledge graphs in scenarios where such embeddings were not possible before due to a significantly lower runtime and significantly reduced hardware requirements.
Challenges of Linking Organizational Information in Open Government Data to Knowledge Graphs
Portisch, Jan, Fallatah, Omaima, Neumaier, Sebastian, Jaradeh, Mohamad Yaser, Polleres, Axel
Open Government Data (OGD) is being published by various public administration organizations around the globe. Within the metadata of OGD data catalogs, the publishing organizations (1) are not uniquely and unambiguously identifiable and, even worse, (2) change over time, by public administration units being merged or restructured. In order to enable fine-grained analyses or searches on Open Government Data on the level of publishing organizations, linking those from OGD portals to publicly available knowledge graphs (KGs) such as Wikidata and DBpedia seems like an obvious solution. Still, as we show in this position paper, organization linking faces significant challenges, both in terms of available (portal) metadata and KGs in terms of data quality and completeness. We herein specifically highlight five main challenges, namely regarding (1) temporal changes in organizations and in the portal metadata, (2) lack of a base ontology for describing organizational structures and changes in public knowledge graphs, (3) metadata and KG data quality, (4) multilinguality, and (5) disambiguating public sector organizations. Based on available OGD portal metadata from the Open Data Portal Watch, we provide an in-depth analysis of these issues, make suggestions for concrete starting points on how to tackle them along with a call to the community to jointly work on these open challenges.