Ichise, Ryutaro
Negative Sampling in Knowledge Graph Representation Learning: A Review
Madushanka, Tiroshan, Ichise, Ryutaro
Knowledge graph representation learning (KGRL) or knowledge graph embedding (KGE) plays a crucial role in AI applications for knowledge construction and information exploration. These models aim to encode entities and relations present in a knowledge graph into a lower-dimensional vector space. During the training process of KGE models, using positive and negative samples becomes essential for discrimination purposes. However, obtaining negative samples directly from existing knowledge graphs poses a challenge, emphasizing the need for effective generation techniques. The quality of these negative samples greatly impacts the accuracy of the learned embeddings, making their generation a critical aspect of KGRL. This comprehensive survey paper systematically reviews various negative sampling (NS) methods and their contributions to the success of KGRL. Their respective advantages and disadvantages are outlined by categorizing existing NS methods into five distinct categories. Moreover, this survey identifies open research questions that serve as potential directions for future investigations. By offering a generalization and alignment of fundamental NS concepts, this survey provides valuable insights for designing effective NS methods in the context of KGRL and serves as a motivating force for further advancements in the field.
TabEAno: Table to Knowledge Graph Entity Annotation
Nguyen, Phuc, Kertkeidkachorn, Natthawut, Ichise, Ryutaro, Takeda, Hideaki
In the Open Data era, a large number of table resources have been made available on the Web and data portals. However, it is difficult to directly utilize such data due to the ambiguity of entities, name variations, heterogeneous schema, missing, or incomplete metadata. To address these issues, we propose a novel approach, namely TabEAno, to semantically annotate table rows toward knowledge graph entities. Specifically, we introduce a "two-cells" lookup strategy bases on the assumption that there is an existing logical relation occurring in the knowledge graph between the two closed cells in the same row of the table. Despite the simplicity of the approach, TabEAno outperforms the state of the art approaches in the two standard datasets e.g, T2D, Limaye with, and in the large-scale Wikipedia tables dataset.
Combination of Unified Embedding Model and Observed Features for Knowledge Graph Completion
Ebisu, Takuma, Ichise, Ryutaro
Combination of Unified Embedding Model and Observed Features for Knowledge Graph Completion T akuma Ebisu 1,2 and Ryutaro Ichise 2,1,3 1 SOKENDAI (The Graduate University for Advanced Studies) 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, Japan 2 National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, Japan 3 National Institute of Advanced Industrial Science and Technology 2-3-26 Aomi, Koto-ku, Tokyo, Japan {takuma,ichise}@nii.ac.jp Abstract Knowledge graphs are useful for many artificial intelligence tasks but often have missing data. Hence, a method for completing knowledge graphs is required. Existing approaches include embedding models, the Path Ranking Algorithm, and rule evaluation models. However, these approaches have limitations. For example, all the information is mixed and difficult to interpret in embedding models, and traditional rule evaluation models are basically slow. In this paper, we provide an integrated view of various approaches and combine them to compensate for their limitations. We first unify state-of-the-art embedding models, such as ComplEx and TorusE, reinterpreting them as a variant of translation-based models. Then, we show that these models utilize paths for link prediction and propose a method for evaluating rules based on this idea. Finally, we combine an embedding model and observed feature models to predict missing triples. This is possible because all of these models utilize paths. We also conduct experiments, including link prediction tasks, with standard datasets to evaluate our method and framework. The experiments show that our method can evaluate rules faster than traditional methods and that our framework outperforms state-of-the-art models in terms of link prediction. 1 Introduction Knowledge graphs are used to describe many types of real-world relations in a form that can be easily processed by a computer.
Graph Pattern Entity Ranking Model for Knowledge Graph Completion
Ebisu, Takuma, Ichise, Ryutaro
Knowledge graphs have evolved rapidly in recent years and their usefulness has been demonstrated in many artificial intelligence tasks. However, knowledge graphs often have lots of missing facts. To solve this problem, many knowledge graph embedding models have been developed to populate knowledge graphs and these have shown outstanding performance. However, knowledge graph embedding models are so-called black boxes, and the user does not know how the information in a knowledge graph is processed and the models can be difficult to interpret. In this paper, we utilize graph patterns in a knowledge graph to overcome such problems. Our proposed model, the {\it graph pattern entity ranking model} (GRank), constructs an entity ranking system for each graph pattern and evaluates them using a ranking measure. By doing so, we can find graph patterns which are useful for predicting facts. Then, we perform link prediction tasks on standard datasets to evaluate our GRank method. We show that our approach outperforms other state-of-the-art approaches such as ComplEx and TorusE for standard metrics such as HITS@{\it n} and MRR. Moreover, our model is easily interpretable because the output facts are described by graph patterns.
EmbNum: Semantic labeling for numerical values with deep metric learning
Nguyen, Phuc, Nguyen, Khai, Ichise, Ryutaro, Takeda, Hideaki
Semantic labeling is a task of matching unknown data source to labeled data sources. The semantic labels could be properties, classes in knowledge bases or labeled data are manually annotated by domain experts. In this paper, we presentEmbNum, a novel approach to match numerical columns from different table data sources. We use a representation network architecture consisting of triplet network and convolutional neural network to learn a mapping function from numerical columns toa transformed space. In this space, the Euclidean distance can be used to measure "semantic similarity" of two columns. Our experiments onCity-Data and Open-Data demonstrate thatEmbNumachieves considerable improvements in comparison with the state-of-the-art methods in effectiveness and efficiency.
TorusE: Knowledge Graph Embedding on a Lie Group
Ebisu, Takuma (SOKENDAI (The Graduate University for Advanced Studies)) | Ichise, Ryutaro (National Institute of Informatics)
Knowledge graphs are useful for many artificial intelligence (AI) tasks. However, knowledge graphs often have missing facts. To populate the graphs, knowledge graph embedding models have been developed. Knowledge graph embedding models map entities and relations in a knowledge graph to a vector space and predict unknown triples by scoring candidate triples. TransE is the first translation-based method and it is well known because of its simplicity and efficiency for knowledge graph completion. It employs the principle that the differences between entity embeddings represent their relations. The principle seems very simple, but it can effectively capture the rules of a knowledge graph. However, TransE has a problem with its regularization. TransE forces entity embeddings to be on a sphere in the embedding vector space. This regularization warps the embeddings and makes it difficult for them to fulfill the abovementioned principle. The regularization also affects adversely the accuracies of the link predictions. On the other hand, regularization is important because entity embeddings diverge by negative sampling without it. This paper proposes a novel embedding model, TorusE, to solve the regularization problem. The principle of TransE can be defined on any Lie group. A torus, which is one of the compact Lie groups, can be chosen for the embedding space to avoid regularization. To the best of our knowledge, TorusE is the first model that embeds objects on other than a real or complex vector space, and this paper is the first to formally discuss the problem of regularization of TransE. Our approach outperforms other state-of-the-art approaches such as TransE, DistMult and ComplEx on a standard link prediction task. We show that TorusE is scalable to large-size knowledge graphs and is faster than the original TransE.
T2KG: An End-to-End System for Creating Knowledge Graph from Unstructured Text
Kertkeidkachorn, Natthawut (Sokendai) | Ichise, Ryutaro (Sokendai)
Knowledge Graph (KG) plays a crucial role in many modern applications. Nevertheless, constructing KG from unstructured text is a challenging problem due to its nature. Consequently, many approaches propose to transform unstructured text to structured text in order to create a KG. Such approaches cannot yet provide reasonable results for mapping an extracted predicate to its identical predicate in another KG. Predicate mapping is an essential procedure because it can reduce the heterogeneity problem and increase searchability over a KG. In this paper, we propose T2KG system, an end-to-end system with keeping such problem into consideration. In the system, a hybrid combination of a rule-based approach and a similarity-based approach is presented for mapping a predicate to its identical predicate in a KG. Based on preliminary experimental results, the hybrid approach improves the recall by 10.02% and the F-measure by 6.56% without reducing the precision in the predicate mapping task. Furthermore, although the KG creation is conducted in open domains, the system still achieves approximately 50% of F-measure for generating triples in the KG creation task.