pykeen
Enhancing PyKEEN with Multiple Negative Sampling Solutions for Knowledge Graph Embedding Models
d'Amato, Claudia, Diliso, Ivan, Fanizzi, Nicola, Saeed, Zafar
Embedding methods have become popular due to their scalability on link prediction and/or triple classification tasks on Knowledge Graphs. Embedding models are trained relying on both positive and negative samples of triples. However, in the absence of negative assertions, these must be usually artificially generated using various negative sampling strategies, ranging from random corruption to more sophisticated techniques which have an impact on the overall performance. Most of the popular libraries for knowledge graph embedding, support only basic such strategies and lack advanced solutions. To address this gap, we deliver an extension for the popular KGE framework PyKEEN that integrates a suite of several advanced negative samplers (including both static and dynamic corruption strategies), within a consistent modular architecture, to generate meaningful negative samples, while remaining compatible with existing PyKEEN -based workflows and pipelines. The developed extension not only enhances PyKEEN itself but also allows for easier and comprehensive development of embedding methods and/or for their customization. As a proof of concept, we present a comprehensive empirical study of the developed extensions and their impact on the performance (link prediction tasks) of different embedding methods, which also provides useful insights for the design of more effective strategies.
The KEEN Universe: An Ecosystem for Knowledge Graph Embeddings with a Focus on Reproducibility and Transferability
Ali, Mehdi, Jabeen, Hajira, Hoyt, Charles Tapley, Lehman, Jens
There is an emerging trend of embedding knowledge graphs (KGs) in continuous vector spaces in order to use those for machine learning tasks. Recently, many knowledge graph embedding (KGE) models have been proposed that learn low dimensional representations while trying to maintain the structural properties of the KGs such as the similarity of nodes depending on their edges to other nodes. KGEs can be used to address tasks within KGs such as the prediction of novel links and the disambiguation of entities. They can also be used for downstream tasks like question answering and fact-checking. Overall, these tasks are relevant for the semantic web community. Despite their popularity, the reproducibility of KGE experiments and the transferability of proposed KGE models to research fields outside the machine learning community can be a major challenge. Therefore, we present the KEEN Universe, an ecosystem for knowledge graph embeddings that we have developed with a strong focus on reproducibility and transferability. The KEEN Universe currently consists of the Python packages PyKEEN (Python KnowlEdge EmbeddiNgs), BioKEEN (Biological KnowlEdge EmbeddiNgs), and the KEEN Model Zoo for sharing trained KGE models with the community.
SmartDataAnalytics/BioKEEN
BioKEEN (Biological KnowlEdge EmbeddiNgs) is a package for training and evaluating biological knowledge graph embeddings built on PyKEEN. Because we use PyKEEN as the underlying software package, implementations of 10 knowledge graph embedding models are currently available for BioKEEN. Furthermore, BioKEEN can be run in training mode in which users provide their own set of hyper-parameter values, or in hyper-parameter optimization mode to find suitable hyper-parameter values from set of user defined values. Through the integration of the Bio2BEL [2] software numerous biomedical databases are directly accessible within BioKEEN. BioKEEN can also be run without having experience in programing by using its interactive command line interface that can be started with the command "biokeen" from a terminal.
SmartDataAnalytics/PyKEEN
PyKEEN (Python KnowlEdge EmbeddiNgs) is a package for training and evaluating knowledge graph embeddings. Currently, it provides implementations of 10 knowledge graph emebddings models, and can be run in training mode in which users provide their own set of hyper-parameter values, or in hyper-parameter optimization mode to find suitable hyper-parameter values from set of user defined values. PyKEEN can also be run without having experience in programing by using its interactive command line interface that can be started with the command pykeen from a terminal. We are currently working on PyKEEN 1.0 which will provide additional features such as several negative sampling approaches and further evaluation metrics. Furthermore, we are integrating additional KGE models.