Semantic Networks
Unsupervised Knowledge Graph Construction and Event-centric Knowledge Infusion for Scientific NLI
Wang, Chenglin, Zhou, Yucheng, Long, Guodong, Wang, Xiaodong, Xu, Xiaowei
With the advance of natural language inference (NLI), a rising demand for NLI is to handle scientific texts. Existing methods depend on pre-trained models (PTM) which lack domain-specific knowledge. To tackle this drawback, we introduce a scientific knowledge graph to generalize PTM to scientific domain. However, existing knowledge graph construction approaches suffer from some drawbacks, i.e., expensive labeled data, failure to apply in other domains, long inference time and difficulty extending to large corpora. Therefore, we propose an unsupervised knowledge graph construction method to build a scientific knowledge graph (SKG) without any labeled data. Moreover, to alleviate noise effect from SKG and complement knowledge in sentences better, we propose an event-centric knowledge infusion method to integrate external knowledge into each event that is a fine-grained semantic unit in sentences. Experimental results show that our method achieves state-of-the-art performance and the effectiveness and reliability of SKG.
Scaling Knowledge Graphs for Automating AI of Digital Twins
Ploennigs, Joern, Semertzidis, Konstantinos, Lorenzi, Fabio, Mihindukulasooriya, Nandana
Digital Twins are digital representations of systems in the Internet of Things (IoT) that are often based on AI models that are trained on data from those systems. Semantic models are used increasingly to link these datasets from different stages of the IoT systems life-cycle together and to automatically configure the AI modelling pipelines. This combination of semantic models with AI pipelines running on external datasets raises unique challenges particular if rolled out at scale. Within this paper we will discuss the unique requirements of applying semantic graphs to automate Digital Twins in different practical use cases. We will introduce the benchmark dataset DTBM that reflects these characteristics and look into the scaling challenges of different knowledge graph technologies. Based on these insights we will propose a reference architecture that is in-use in multiple products in IBM and derive lessons learned for scaling knowledge graphs for configuring AI models for Digital Twins.
ProVe: A Pipeline for Automated Provenance Verification of Knowledge Graphs against Textual Sources
Amaral, Gabriel, Rodrigues, Odinaldo, Simperl, Elena
A Knowledge Graph (KG) is a type of knowledge base that stores information in the form of semantic triples formed by a subject, a predicate, and an object. KGs represent both real and abstract entities internally as labelled and uniquely identifiable entities, such as The Moon or Happiness, and can amass information from a multitude of domains and sources by connecting such entities amongst themselves or to literals through relationships, coded via uniquely identified predicates. KGs serve as sources of both human and machine-readable semantically structured data for various crucial applications in the modern web landscape, such as Wikipedia infoboxes, search engines results, voice-activated assistants, and information gathering projects [30]. Developed and maintained by ontology experts, data curators, and even anonymous volunteers, KGs have massively grown in size and adoption in the last decade, mainly as secondary sources of information. This means not storing new information, but taking it from authoritative and reliable sources which are explicitly referenced. As such, KGs depend on well-documented and verifiable provenance to ensure they are regarded as trustworthy and usable [56]. Processes to assess and assure the quality of information provenance are thus crucial to KGs, especially measuring and maintaining verifiability, i.e. the degree to which consumers of KG triples can attest these are truly supported by their sources [56]. However, such processes are currently performed mostly manually, which does not scale with size. Manually ensuring high verifiability on vital KGs such as Wikidata and DBpedia is prohibitive due to their sheer size.
PALT: Parameter-Lite Transfer of Language Models for Knowledge Graph Completion
Shen, Jianhao, Wang, Chenguang, Yuan, Ye, Han, Jiawei, Ji, Heng, Sen, Koushik, Zhang, Ming, Song, Dawn
This paper presents a parameter-lite transfer learning approach of pretrained language models (LM) for knowledge graph (KG) completion. Instead of finetuning, which modifies all LM parameters, we only tune a few new parameters while keeping the original LM parameters fixed. We establish this via reformulating KG completion as a "fill-in-the-blank" task, and introducing a parameter-lite encoder on top of the original LMs. We show that, by tuning far fewer parameters than finetuning, LMs transfer non-trivially to most tasks and reach competitiveness with prior state-of-the-art approaches. For instance, we outperform the fully finetuning approaches on a KG completion benchmark by tuning only 1% of the parameters. The code and datasets are available at \url{https://github.com/yuanyehome/PALT}.
ReaRev: Adaptive Reasoning for Question Answering over Knowledge Graphs
Mavromatis, Costas, Karypis, George
Knowledge Graph Question Answering (KGQA) involves retrieving entities as answers from a Knowledge Graph (KG) using natural language queries. The challenge is to learn to reason over question-relevant KG facts that traverse KG entities and lead to the question answers. To facilitate reasoning, the question is decoded into instructions, which are dense question representations used to guide the KG traversals. However, if the derived instructions do not exactly match the underlying KG information, they may lead to reasoning under irrelevant context. Our method, termed ReaRev, introduces a new way to KGQA reasoning with respect to both instruction decoding and execution. To improve instruction decoding, we perform reasoning in an adaptive manner, where KG-aware information is used to iteratively update the initial instructions. To improve instruction execution, we emulate breadth-first search (BFS) with graph neural networks (GNNs). The BFS strategy treats the instructions as a set and allows our method to decide on their execution order on the fly. Experimental results on three KGQA benchmarks demonstrate the ReaRev's effectiveness compared with previous state-of-the-art, especially when the KG is incomplete or when we tackle complex questions. Our code is publicly available at https://github.com/cmavro/ReaRev_KGQA.
TranSHER: Translating Knowledge Graph Embedding with Hyper-Ellipsoidal Restriction
Li, Yizhi, Fan, Wei, Liu, Chao, Lin, Chenghua, Qian, Jiang
Knowledge graph embedding methods are important for the knowledge graph completion (or link prediction) task. One existing efficient method, PairRE, leverages two separate vectors to model complex relations (i.e., 1-to-N, N-to-1, and N-to-N) in knowledge graphs. However, such a method strictly restricts entities on the hyper-ellipsoid surfaces which limits the optimization of entity distribution, leading to suboptimal performance of knowledge graph completion. To address this issue, we propose a novel score function TranSHER, which leverages relation-specific translations between head and tail entities to relax the constraint of hyper-ellipsoid restrictions. By introducing an intuitive and simple relation-specific translation, TranSHER can provide more direct guidance on optimization and capture more semantic characteristics of entities with complex relations. Experimental results show that TranSHER achieves significant performance improvements on link prediction and generalizes well to datasets in different domains and scales. Our codes are public available at https://github.com/yizhilll/TranSHER.
SMiLE: Schema-augmented Multi-level Contrastive Learning for Knowledge Graph Link Prediction
Peng, Miao, Liu, Ben, Xie, Qianqian, Xu, Wenjie, Wang, Hua, Peng, Min
Link prediction is the task of inferring missing links between entities in knowledge graphs. Embedding-based methods have shown effectiveness in addressing this problem by modeling relational patterns in triples. However, the link prediction task often requires contextual information in entity neighborhoods, while most existing embedding-based methods fail to capture it. Additionally, little attention is paid to the diversity of entity representations in different contexts, which often leads to false prediction results. In this situation, we consider that the schema of knowledge graph contains the specific contextual information, and it is beneficial for preserving the consistency of entities across contexts. In this paper, we propose a novel Schema-augmented Multi-level contrastive LEarning framework (SMiLE) to conduct knowledge graph link prediction. Specifically, we first exploit network schema as the prior constraint to sample negatives and pre-train our model by employing a multi-level contrastive learning method to yield both prior schema and contextual information. Then we fine-tune our model under the supervision of individual triples to learn subtler representations for link prediction. Extensive experimental results on four knowledge graph datasets with thorough analysis of each component demonstrate the effectiveness of our proposed framework against state-of-the-art baselines. The implementation of SMiLE is available at https://github.com/GKNL/SMiLE.
EDUKG: a Heterogeneous Sustainable K-12 Educational Knowledge Graph
Zhao, Bowen, Sun, Jiuding, Xu, Bin, Lu, Xingyu, Li, Yuchen, Yu, Jifan, Liu, Minghui, Zhang, Tingjian, Chen, Qiuyang, Li, Hanming, Hou, Lei, Li, Juanzi
Web and artificial intelligence technologies, especially semantic web and knowledge graph (KG), have recently raised significant attention in educational scenarios. Nevertheless, subject-specific KGs for K-12 education still lack sufficiency and sustainability from knowledge and data perspectives. To tackle these issues, we propose EDUKG, a heterogeneous sustainable K-12 Educational Knowledge Graph. We first design an interdisciplinary and fine-grained ontology for uniformly modeling knowledge and resource in K-12 education, where we define 635 classes, 445 object properties, and 1314 datatype properties in total. Guided by this ontology, we propose a flexible methodology for interactively extracting factual knowledge from textbooks. Furthermore, we establish a general mechanism based on our proposed generalized entity linking system for EDUKG's sustainable maintenance, which can dynamically index numerous heterogeneous resources and data with knowledge topics in EDUKG. We further evaluate EDUKG to illustrate its sufficiency, richness, and variability. We publish EDUKG with more than 252 million entities and 3.86 billion triplets. Our code and data repository is now available at https://github.com/THU-KEG/EDUKG.
DEER: Descriptive Knowledge Graph for Explaining Entity Relationships
Huang, Jie, Zhu, Kerui, Chang, Kevin Chen-Chuan, Xiong, Jinjun, Hwu, Wen-mei
We propose DEER (Descriptive Knowledge Graph for Explaining Entity Relationships) - an open and informative form of modeling entity relationships. In DEER, relationships between entities are represented by free-text relation descriptions. For instance, the relationship between entities of machine learning and algorithm can be represented as ``Machine learning explores the study and construction of algorithms that can learn from and make predictions on data.'' To construct DEER, we propose a self-supervised learning method to extract relation descriptions with the analysis of dependency patterns and generate relation descriptions with a transformer-based relation description synthesizing model, where no human labeling is required. Experiments demonstrate that our system can extract and generate high-quality relation descriptions for explaining entity relationships. The results suggest that we can build an open and informative knowledge graph without human annotation.
Transformer-based Entity Typing in Knowledge Graphs
Hu, Zhiwei, Gutiérrez-Basulto, Víctor, Xiang, Zhiliang, Li, Ru, Pan, Jeff Z.
We investigate the knowledge graph entity typing task which aims at inferring plausible entity types. In this paper, we propose a novel Transformer-based Entity Typing (TET) approach, effectively encoding the content of neighbors of an entity. More precisely, TET is composed of three different mechanisms: a local transformer allowing to infer missing types of an entity by independently encoding the information provided by each of its neighbors; a global transformer aggregating the information of all neighbors of an entity into a single long sequence to reason about more complex entity types; and a context transformer integrating neighbors content based on their contribution to the type inference through information exchange between neighbor pairs. Furthermore, TET uses information about class membership of types to semantically strengthen the representation of an entity. Experiments on two real-world datasets demonstrate the superior performance of TET compared to the state-of-the-art.