Semantic Networks
To Copy Rather Than Memorize: A Vertical Learning Paradigm for Knowledge Graph Completion
Li, Rui, Chen, Xu, Li, Chaozhuo, Shen, Yanming, Zhao, Jianan, Wang, Yujing, Han, Weihao, Sun, Hao, Deng, Weiwei, Zhang, Qi, Xie, Xing
Embedding models have shown great power in knowledge graph completion (KGC) task. By learning structural constraints for each training triple, these methods implicitly memorize intrinsic relation rules to infer missing links. However, this paper points out that the multi-hop relation rules are hard to be reliably memorized due to the inherent deficiencies of such implicit memorization strategy, making embedding models underperform in predicting links between distant entity pairs. To alleviate this problem, we present Vertical Learning Paradigm (VLP), which extends embedding models by allowing to explicitly copy target information from related factual triples for more accurate prediction. Rather than solely relying on the implicit memory, VLP directly provides additional cues to improve the generalization ability of embedding models, especially making the distant link prediction significantly easier. Moreover, we also propose a novel relative distance based negative sampling technique (ReD) for more effective optimization. Experiments demonstrate the validity and generality of our proposals on two standard benchmarks. Our code is available at https://github.com/rui9812/VLP.
Query Structure Modeling for Inductive Logical Reasoning Over Knowledge Graphs
Wang, Siyuan, Wei, Zhongyu, Han, Meng, Fan, Zhihao, Shan, Haijun, Zhang, Qi, Huang, Xuanjing
Logical reasoning over incomplete knowledge graphs to answer complex logical queries is a challenging task. With the emergence of new entities and relations in constantly evolving KGs, inductive logical reasoning over KGs has become a crucial problem. However, previous PLMs-based methods struggle to model the logical structures of complex queries, which limits their ability to generalize within the same structure. In this paper, we propose a structure-modeled textual encoding framework for inductive logical reasoning over KGs. It encodes linearized query structures and entities using pre-trained language models to find answers. For structure modeling of complex queries, we design stepwise instructions that implicitly prompt PLMs on the execution order of geometric operations in each query. We further separately model different geometric operations (i.e., projection, intersection, and union) on the representation space using a pre-trained encoder with additional attention and maxout layers to enhance structured modeling. We conduct experiments on two inductive logical reasoning datasets and three transductive datasets. The results demonstrate the effectiveness of our method on logical reasoning over KGs in both inductive and transductive settings.
3D Rotation and Translation for Hyperbolic Knowledge Graph Embedding
Zhu, Yihua, Shimodaira, Hidetoshi
The main objective of Knowledge Graph (KG) embeddings is to learn low-dimensional representations of entities and relations, enabling the prediction of missing facts. A significant challenge in achieving better KG embeddings lies in capturing relation patterns, including symmetry, antisymmetry, inversion, commutative composition, non-commutative composition, hierarchy, and multiplicity. This study introduces a novel model called 3H-TH (3D Rotation and Translation in Hyperbolic space) that captures these relation patterns simultaneously. In contrast, previous attempts have not achieved satisfactory performance across all the mentioned properties at the same time. The experimental results demonstrate that the new model outperforms existing state-of-the-art models in terms of accuracy, hierarchy property, and other relation patterns in low-dimensional space, meanwhile performing similarly in high-dimensional space.
Logical Entity Representation in Knowledge-Graphs for Differentiable Rule Learning
Han, Chi, He, Qizheng, Yu, Charles, Du, Xinya, Tong, Hanghang, Ji, Heng
Probabilistic logical rule learning has shown great strength in logical rule mining and knowledge graph completion. It learns logical rules to predict missing edges by reasoning on existing edges in the knowledge graph. However, previous efforts have largely been limited to only modeling chain-like Horn clauses such as $R_1(x,z)\land R_2(z,y)\Rightarrow H(x,y)$. This formulation overlooks additional contextual information from neighboring sub-graphs of entity variables $x$, $y$ and $z$. Intuitively, there is a large gap here, as local sub-graphs have been found to provide important information for knowledge graph completion. Inspired by these observations, we propose Logical Entity RePresentation (LERP) to encode contextual information of entities in the knowledge graph. A LERP is designed as a vector of probabilistic logical functions on the entity's neighboring sub-graph. It is an interpretable representation while allowing for differentiable optimization. We can then incorporate LERP into probabilistic logical rule learning to learn more expressive rules. Empirical results demonstrate that with LERP, our model outperforms other rule learning methods in knowledge graph completion and is comparable or even superior to state-of-the-art black-box methods. Moreover, we find that our model can discover a more expressive family of logical rules. LERP can also be further combined with embedding learning methods like TransE to make it more interpretable.
Self-Distillation with Meta Learning for Knowledge Graph Completion
Li, Yunshui, Liu, Junhao, Li, Chengming, Yang, Min
In this paper, we propose a selfdistillation framework with meta learning(MetaSD) for knowledge graph completion with dynamic pruning, which aims to learn compressed graph embeddings and tackle the longtail samples. Specifically, we first propose a dynamic pruning technique to obtain a small pruned model from a large source model, where the pruning mask of the pruned model could be updated adaptively per epoch after the model weights are updated. The pruned model is supposed to be more sensitive to difficult to memorize samples(e.g., longtail samples) than the source model. Then, we propose a onestep meta selfdistillation method for distilling comprehensive knowledge from the source model to the pruned model, where the two models coevolve in a dynamic manner during training. In particular, we exploit the performance of the pruned model, which is trained alongside the source model in one iteration, to improve the source models knowledge transfer ability for the next iteration via meta learning. Extensive experiments show that MetaSD achieves competitive performance compared to strong baselines, while being 10x smaller than baselines.
PubGraph: A Large-Scale Scientific Knowledge Graph
Ahrabian, Kian, Du, Xinwei, Myloth, Richard Delwin, Ananthan, Arun Baalaaji Sankar, Pujara, Jay
Research publications are the primary vehicle for sharing scientific progress in the form of new discoveries, methods, techniques, and insights. Unfortunately, the lack of a large-scale, comprehensive, and easy-to-use resource capturing the myriad relationships between publications, their authors, and venues presents a barrier to applications for gaining a deeper understanding of science. In this paper, we present PubGraph, a new resource for studying scientific progress that takes the form of a large-scale knowledge graph (KG) with more than 385M entities, 13B main edges, and 1.5B qualifier edges. PubGraph is comprehensive and unifies data from various sources, including Wikidata, OpenAlex, and Semantic Scholar, using the Wikidata ontology. Beyond the metadata available from these sources, PubGraph includes outputs from auxiliary community detection algorithms and large language models. To further support studies on reasoning over scientific networks, we create several large-scale benchmarks extracted from PubGraph for the core task of knowledge graph completion (KGC). These benchmarks present many challenges for knowledge graph embedding models, including an adversarial community-based KGC evaluation setting, zero-shot inductive learning, and large-scale learning. All of the aforementioned resources are accessible at https://pubgraph.isi.edu/ and released under the CC-BY-SA license. We plan to update PubGraph quarterly to accommodate the release of new publications.
GAP: A Graph-aware Language Model Framework for Knowledge Graph-to-Text Generation
Colas, Anthony, Alvandipour, Mehrdad, Wang, Daisy Zhe
Recent improvements in KG-to-text generation are due to additional auxiliary pre-training tasks designed to give the fine-tune task a boost in performance. These tasks require extensive computational resources while only suggesting marginal improvements. Here, we demonstrate that by fusing graph-aware elements into existing pre-trained language models, we are able to outperform state-of-the-art models and close the gap imposed by additional pre-training tasks. We do so by proposing a mask structure to capture neighborhood information and a novel type encoder that adds a bias to the graph-attention weights depending on the connection type. Experiments on two KG-to-text benchmark datasets show our models are competitive while involving fewer parameters and no additional pre-training tasks. By formulating the problem as a framework, we can interchange the various proposed components and begin interpreting KG-to-text generative models based on the topological and type information found in a graph.
FactKG: Fact Verification via Reasoning on Knowledge Graphs
Kim, Jiho, Park, Sungjin, Kwon, Yeonsu, Jo, Yohan, Thorne, James, Choi, Edward
In real world applications, knowledge graphs (KG) are widely used in various domains (e.g. medical applications and dialogue agents). However, for fact verification, KGs have not been adequately utilized as a knowledge source. KGs can be a valuable knowledge source in fact verification due to their reliability and broad applicability. A KG consists of nodes and edges which makes it clear how concepts are linked together, allowing machines to reason over chains of topics. However, there are many challenges in understanding how these machine-readable concepts map to information in text. To enable the community to better use KGs, we introduce a new dataset, FactKG: Fact Verification via Reasoning on Knowledge Graphs. It consists of 108k natural language claims with five types of reasoning: One-hop, Conjunction, Existence, Multi-hop, and Negation. Furthermore, FactKG contains various linguistic patterns, including colloquial style claims as well as written style claims to increase practicality. Lastly, we develop a baseline approach and analyze FactKG over these reasoning types. We believe FactKG can advance both reliability and practicality in KG-based fact verification.
River of No Return: Graph Percolation Embeddings for Efficient Knowledge Graph Reasoning
Wang, Kai, Luo, Siqiang, Lin, Dan
We study Graph Neural Networks (GNNs)-based embedding techniques for knowledge graph (KG) reasoning. For the first time, we link the path redundancy issue in the state-of-the-art KG reasoning models based on path encoding and message passing to the transformation error in model training, which brings us new theoretical insights into KG reasoning, as well as high efficacy in practice. On the theoretical side, we analyze the entropy of transformation error in KG paths and point out query-specific redundant paths causing entropy increases. These findings guide us to maintain the shortest paths and remove redundant paths for minimized-entropy message passing. To achieve this goal, on the practical side, we propose an efficient Graph Percolation Process motivated by the percolation model in Fluid Mechanics, and design a lightweight GNN-based KG reasoning framework called Graph Percolation Embeddings (GraPE). GraPE outperforms previous state-of-the-art methods in both transductive and inductive reasoning tasks while requiring fewer training parameters and less inference time.
Analysing Biomedical Knowledge Graphs using Prime Adjacency Matrices
Bougiatiotis, Konstantinos, Paliouras, Georgios
Most phenomena related to biomedical tasks are inherently complex, and in many cases, are expressed as signals on biomedical Knowledge Graphs (KGs). In this work, we introduce the use of a new representation framework, the Prime Adjacency Matrix (PAM) for biomedical KGs, which allows for very efficient network analysis. PAM utilizes prime numbers to enable representing the whole KG with a single adjacency matrix and the fast computation of multiple properties of the network. We illustrate the applicability of the framework in the biomedical domain by working on different biomedical knowledge graphs and by providing two case studies: one on drug-repurposing for COVID-19 and one on important metapath extraction. We show that we achieve better results than the original proposed workflows, using very simple methods that require no training, in considerably less time.