Goto

Collaborating Authors

 fb15k-237




DrKGC: Dynamic Subgraph Retrieval-Augmented LLMs for Knowledge Graph Completion across General and Biomedical Domains

Xiao, Yongkang, Zhang, Sinian, Dai, Yi, Zhou, Huixue, Hou, Jue, Ding, Jie, Zhang, Rui

arXiv.org Artificial Intelligence

Knowledge graph completion (KGC) aims to predict missing triples in knowledge graphs (KGs) by leveraging existing triples and textual information. Recently, generative large language models (LLMs) have been increasingly employed for graph tasks. However, current approaches typically encode graph context in textual form, which fails to fully exploit the potential of LLMs for perceiving and reasoning about graph structures. To address this limitation, we propose DrKGC (Dynamic Subgraph Retrieval-Augmented LLMs for Knowledge Graph Completion). DrKGC employs a flexible lightweight model training strategy to learn structural embeddings and logical rules within the KG. It then leverages a novel bottom-up graph retrieval method to extract a subgraph for each query guided by the learned rules. Finally, a graph convolutional network (GCN) adapter uses the retrieved subgraph to enhance the structural embeddings, which are then integrated into the prompt for effective LLM fine-tuning. Experimental results on two general domain benchmark datasets and two biomedical datasets demonstrate the superior performance of DrKGC. Furthermore, a realistic case study in the biomedical domain highlights its interpretability and practical utility.


KG-FIT: Knowledge Graph Fine-Tuning Upon Open-World Knowledge

Neural Information Processing Systems

While current KGE methods have shown success, many are limited to the graph structure alone, neglecting the wealth of open-world knowledge surrounding entities not explicitly depicted in the KG, which is manually created in most cases.



Evaluating Cumulative Spectral Gradient as a Complexity Measure

Gul, Haji, Naim, Abdul Ghani, Bhat, Ajaz Ahmad

arXiv.org Artificial Intelligence

Accurate estimation of dataset complexity is crucial for evaluating and comparing link-prediction models for knowledge graphs (KGs). The Cumulative Spectral Gradient (CSG) metric ( Branchaud-Charron et al., 2019) --derived from probabilistic divergence between classes within a spectral clustering framework-- was proposed as a dataset complexity measure that (1) naturally scales with the number of classes and (2) correlates strongly with downstream classification performance. In this work, we rigorously assess CSG's behavior on standard knowledge-graph link-prediction benchmarks--a multi-class tail-prediction task-- using two key parameters governing its computation: M, the number of Monte Carlo-sampled points per class, and K, the number of nearest neighbors in the embedding space. Contrary to the original claims, we find that (1) CSG is highly sensitive to the choice of K, thereby does not inherently scale with the number of target classes, and (2) CSG values exhibit weak or no correlation with established performance metrics such as mean reciprocal rank (MRR). Through experiments on FB15k-237, WN18RR, and other standard datasets, we demonstrate that CSG's purported stability and generalization-predictive power break down in link-prediction settings. Our results highlight the need for more robust, classifier-agnostic complexity measures in KG link-prediction evaluation.


RSCF: Relation-Semantics Consistent Filter for Entity Embedding of Knowledge Graph

Kim, Junsik, Park, Jinwook, Kim, Kangil

arXiv.org Artificial Intelligence

In knowledge graph embedding, leveraging relation specific entity transformation has markedly enhanced performance. However, the consistency of embedding differences before and after transformation remains unaddressed, risking the loss of valuable inductive bias inherent in the embeddings. This inconsistency stems from two problems. First, transformation representations are specified for relations in a disconnected manner, allowing dissimilar transformations and corresponding entity embeddings for similar relations. Second, a generalized plug-in approach as a SFBR (Semantic Filter Based on Relations) disrupts this consistency through excessive concentration of entity embeddings under entity-based regularization, generating indistinguishable score distributions among relations. In this paper, we introduce a plug-in KGE method, Relation-Semantics Consistent Filter (RSCF). Its entity transformation has three features for enhancing semantic consistency: 1) shared affine transformation of relation embeddings across all relations, 2) rooted entity transformation that adds an entity embedding to its change represented by the transformed vector, and 3) normalization of the change to prevent scale reduction. To amplify the advantages of consistency that preserve semantics on embeddings, RSCF adds relation transformation and prediction modules for enhancing the semantics. In knowledge graph completion tasks with distance-based and tensor decomposition models, RSCF significantly outperforms state-of-the-art KGE methods, showing robustness across all relations and their frequencies.


MuCo-KGC: Multi-Context-Aware Knowledge Graph Completion

Gul, Haji, Bhat, Ajaz Ahmad, Naim, Abdul Ghani Haji

arXiv.org Artificial Intelligence

Knowledge graph completion (KGC) seeks to predict missing entities (e.g., heads or tails) or relationships in knowledge graphs (KGs), which often contain incomplete data. Traditional embedding-based methods, such as TransE and ComplEx, have improved tail entity prediction but struggle to generalize to unseen entities during testing. Textual-based models mitigate this issue by leveraging additional semantic context; however, their reliance on negative triplet sampling introduces high computational overhead, semantic inconsistencies, and data imbalance. Recent approaches, like KG-BERT, show promise but depend heavily on entity descriptions, which are often unavailable in KGs. Critically, existing methods overlook valuable structural information in the KG related to the entities and relationships. To address these challenges, we propose Multi-Context-Aware Knowledge Graph Completion (MuCo-KGC), a novel model that utilizes contextual information from linked entities and relations within the graph to predict tail entities. MuCo-KGC eliminates the need for entity descriptions and negative triplet sampling, significantly reducing computational complexity while enhancing performance. Our experiments on standard datasets, including FB15k-237, WN18RR, CoDEx-S, and CoDEx-M, demonstrate that MuCo-KGC outperforms state-of-the-art methods on three datasets. Notably, MuCo-KGC improves MRR on WN18RR, and CoDEx-S and CoDEx-M datasets by $1.63\%$, and $3.77\%$ and $20.15\%$ respectively, demonstrating its effectiveness for KGC tasks.


Review for NeurIPS paper: Beta Embeddings for Multi-Hop Logical Reasoning in Knowledge Graphs

Neural Information Processing Systems

The theoretical analysis of the model is insufficient. For example, the author does not give an analysis of the full expressiveness of the model. That is, given any world with correct answers of some first-order logic queries W and false answers Wc, does there exist an assignment for model parameters that correctly classifies the entities in W and Wc? The reviewer is especially curious about the theoretical analysis of the proposed probabilistic negation operator because there are no comparative empirical results on answering queries with negation (all existing models cannot deal with negation). On EPFO queries, the author compared the proposed model only with two baselines.


PathE: Leveraging Entity-Agnostic Paths for Parameter-Efficient Knowledge Graph Embeddings

Reklos, Ioannis, de Berardinis, Jacopo, Simperl, Elena, Meroño-Peñuela, Albert

arXiv.org Artificial Intelligence

Knowledge Graphs (KGs) store human knowledge in the form of entities (nodes) and relations, and are used extensively in various applications. KG embeddings are an effective approach to addressing tasks like knowledge discovery, link prediction, and reasoning. This is often done by allocating and learning embedding tables for all or a subset of the entities. As this scales linearly with the number of entities, learning embedding models in real-world KGs with millions of nodes can be computationally intractable. To address this scalability problem, our model, PathE, only allocates embedding tables for relations (which are typically orders of magnitude fewer than the entities) and requires less than 25% of the parameters of previous parameter efficient methods. Rather than storing entity embeddings, we learn to compute them by leveraging multiple entity-relation paths to contextualise individual entities within triples. Evaluated on four benchmarks, PathE achieves state-of-the-art performance in relation prediction, and remains competitive in link prediction on path-rich KGs while training on consumer-grade hardware. We perform ablation experiments to test our design choices and analyse the sensitivity of the model to key hyper-parameters. PathE is efficient and cost-effective for relationally diverse and well-connected KGs commonly found in real-world applications.