Goto

Collaborating Authors

 Semantic Networks


UrbanKGent: A Unified Large Language Model Agent Framework for Urban Knowledge Graph Construction

Neural Information Processing Systems

Urban knowledge graph has recently worked as an emerging building block to distill critical knowledge from multi-sourced urban data for diverse urban application scenarios. Despite its promising benefits, urban knowledge graph construction (UrbanKGC) still heavily relies on manual effort, hindering its potential advancement. This paper presents UrbanKGent, a unified large language model agent framework, for urban knowledge graph construction. Specifically, we first construct the knowledgeable instruction set for UrbanKGC tasks (such as relational triplet extraction and knowledge graph completion) via heterogeneity-aware and geospatial-infused instruction generation. Moreover, we propose a tool-augmented iterative trajectory refinement module to enhance and refine the trajectories distilled from GPT-4. Through hybrid instruction fine-tuning with augmented trajectories on Llama 2 and Llama 3 family, we obtain UrbanKGC agent family, consisting of UrbanKGent-7/8/13B version.


Knowledge Graph Completion by Intermediate Variables Regularization

Neural Information Processing Systems

Knowledge graph completion (KGC) can be framed as a 3-order binary tensor completion task. Tensor decomposition-based (TDB) models have demonstrated strong performance in KGC. In this paper, we provide a summary of existing TDB models and derive a general form for them, serving as a foundation for further exploration of TDB models. Despite the expressiveness of TDB models, they are prone to overfitting. Existing regularization methods merely minimize the norms of embeddings to regularize the model, leading to suboptimal performance.


Construction and Application of Materials Knowledge Graph in Multidisciplinary Materials Science via Large Language Model

Neural Information Processing Systems

Knowledge in materials science is widely dispersed across extensive scientific literature, posing significant challenges for efficient discovery and integration of new materials. Traditional methods, often reliant on costly and time-consuming experimental approaches, further complicate rapid innovation. Addressing these challenges, the integration of artificial intelligence with materials science has opened avenues for accelerating the discovery process, though it also demands precise annotation, data extraction, and traceability of information. To tackle these issues, this article introduces the Materials Knowledge Graph (MKG), which utilizes advanced natural language processing techniques, integrated with large language models to extract and systematically organize a decade's worth of high-quality research into structured triples, contains 162,605 nodes and 731,772 edges. MKG categorizes information into comprehensive labels such as Name, Formula, and Application, structured around a meticulously designed ontology, thus enhancing data usability and integration.


DECRL: A Deep Evolutionary Clustering Jointed Temporal Knowledge Graph Representation Learning Approach

Neural Information Processing Systems

Temporal Knowledge Graph (TKG) representation learning aims to map temporal evolving entities and relations to embedded representations in a continuous low-dimensional vector space. To this end, we propose a Deep Evolutionary Clustering jointed temporal knowledge graph Representation Learning approach (DECRL). Specifically, a deep evolutionary clustering module is proposed to capture the temporal evolution of high-order correlations among entities. Furthermore, a cluster-aware unsupervised alignment mechanism is introduced to ensure the precise one-to-one alignment of soft overlapping clusters across timestamps, thereby maintaining the temporal smoothness of clusters. In addition, an implicit correlation encoder is introduced to capture latent correlations between any pair of clusters under the guidance of a global graph.


Is Architectural Complexity Overrated? Competitive and Interpretable Knowledge Graph Completion with RelatE

arXiv.org Artificial Intelligence

We revisit the efficacy of simple, real-valued embedding models for knowledge graph completion and introduce RelatE, an interpretable and modular method that efficiently integrates dual representations for entities and relations. RelatE employs a real-valued phase-modulus decomposition, leveraging sinusoidal phase alignments to encode relational patterns such as symmetry, inversion, and composition. In contrast to recent approaches based on complex-valued embeddings or deep neural architectures, RelatE preserves architectural simplicity while achieving competitive or superior performance on standard benchmarks. Empirically, RelatE outperforms prior methods across several datasets: on YAGO3-10, it achieves an MRR of 0.521 and Hit@10 of 0.680, surpassing all baselines. Additionally, RelatE offers significant efficiency gains, reducing training time by 24%, inference latency by 31%, and peak GPU memory usage by 22% compared to RotatE. Perturbation studies demonstrate improved robustness, with MRR degradation reduced by up to 61% relative to TransE and by up to 19% compared to RotatE under structural edits such as edge removals and relation swaps. Formal analysis further establishes the model's full expressiveness and its capacity to represent essential first-order logical inference patterns. These results position RelatE as a scalable and interpretable alternative to more complex architectures for knowledge graph completion.


Robust Knowledge Graph Embedding via Denoising

arXiv.org Artificial Intelligence

We focus on obtaining robust knowledge graph embedding under perturbation in the embedding space. To address these challenges, we introduce a novel framework, Robust Knowledge Graph Embedding via Denoising, which enhances the robustness of KGE models on noisy triples. By treating KGE methods as energy-based models, we leverage the established connection between denoising and score matching, enabling the training of a robust denoising KGE model. Furthermore, we propose certified robustness evaluation metrics for KGE methods based on the concept of randomized smoothing. Through comprehensive experiments on benchmark datasets, our framework consistently shows superior performance compared to existing state-of-the-art KGE methods when faced with perturbed entity embedding.


UKnow: A Unified Knowledge Protocol with Multimodal Knowledge Graph Datasets for Reasoning and Vision-Language Pre-Training

Neural Information Processing Systems

This work presents a unified knowledge protocol, called UKnow, which facilitates knowledge-based studies from the perspective of data. Particularly focusing on visual and linguistic modalities, we categorize data knowledge into five unit types, namely, in-image, in-text, cross-image, cross-text, and image-text, and set up an efficient pipeline to help construct the multimodal knowledge graph from any data collection. Thanks to the logical information naturally contained in knowledge graph, organizing datasets under UKnow format opens up more possibilities of data usage compared to the commonly used image-text pairs. Following UKnow protocol, we collect, from public international news, a large-scale multimodal knowledge graph dataset that consists of 1,388,568 nodes (with 571,791 vision-related ones) and 3,673,817 triplets. The dataset is also annotated with rich event tags, including 11 coarse labels and 9,185 fine labels.


Clustering then Propagation: Select Better Anchors for Knowledge Graph Embedding

Neural Information Processing Systems

Traditional knowledge graph embedding (KGE) models map entities and relations to unique embedding vectors in a shallow lookup manner. As the scale of data becomes larger, this manner will raise unaffordable computational costs. Anchor-based strategies have been treated as effective ways to alleviate such efficiency problems by propagation on representative entities instead of the whole graph. However, most existing anchor-based KGE models select the anchors in a primitive manner, which limits their performance. To this end, we propose a novel anchor-based strategy for KGE, i.e., a relational clustering-based anchor selection strategy (RecPiece), where two characteristics are leveraged, i.e., (1) representative ability of the cluster centroids and (2) descriptive ability of relation types in KGs. Specifically, we first perform clustering over features of factual triplets instead of entities, where cluster number is naturally set as number of relation types since each fact can be characterized by its relation in KGs.


Large Language Models-guided Dynamic Adaptation for Temporal Knowledge Graph Reasoning

Neural Information Processing Systems

Temporal Knowledge Graph Reasoning (TKGR) is the process of utilizing temporal information to capture complex relations within a Temporal Knowledge Graph (TKG) to infer new knowledge. Conventional methods in TKGR typically depend on deep learning algorithms or temporal logical rules. However, deep learning-based TKGRs often lack interpretability, whereas rule-based TKGRs struggle to effectively learn temporal rules that capture temporal patterns. Recently, Large Language Models (LLMs) have demonstrated extensive knowledge and remarkable proficiency in temporal reasoning. Consequently, the employment of LLMs for Temporal Knowledge Graph Reasoning (TKGR) has sparked increasing interest among researchers. Nonetheless, LLMs are known to function as black boxes, making it challenging to comprehend their reasoning process.


A Prompt-Based Knowledge Graph Foundation Model for Universal In-Context Reasoning

Neural Information Processing Systems

Extensive knowledge graphs (KGs) have been constructed to facilitate knowledge-driven tasks across various scenarios. However, existing work usually develops separate reasoning models for different KGs, lacking the ability to generalize and transfer knowledge across diverse KGs and reasoning settings. In this paper, we propose a prompt-based KG foundation model via in-context learning, namely KG-ICL, to achieve a universal reasoning ability. Specifically, we introduce a prompt graph centered with a query-related example fact as context to understand the query relation. To encode prompt graphs with the generalization ability to unseen entities and relations in queries, we first propose a unified tokenizer that maps entities and relations in prompt graphs to predefined tokens. Then, we propose two message passing neural networks to perform prompt encoding and KG reasoning, respectively.