Goto

Collaborating Authors

 Semantic Networks


AGENTICT$^2$S:Robust Text-to-SPARQL via Agentic Collaborative Reasoning over Heterogeneous Knowledge Graphs for the Circular Economy

arXiv.org Artificial Intelligence

Question answering over heterogeneous knowledge graphs (KGQA) involves reasoning across diverse schemas, incomplete alignments, and distributed data sources. Existing text-to-SPARQL approaches rely on large-scale domain-specific fine-tuning or operate within single-graph settings, limiting their generalizability in low-resource domains and their ability to handle queries spanning multiple graphs. These challenges are particularly relevant in domains such as the circular economy, where information about classifications, processes, and emissions is distributed across independently curated knowledge graphs (KGs). We present AgenticT$^2$S, a modular framework that decomposes KGQA into subtasks managed by specialized agents responsible for retrieval, query generation, and verification. A scheduler assigns subgoals to different graphs using weak-to-strong alignment strategies. A two-stage verifier detects structurally invalid and semantically underspecified queries through symbolic validation and counterfactual consistency checks. Experiments on real-world circular economy KGs demonstrate that AgenticT$^2$S improves execution accuracy by 17.3% and triple level F$_1$ by 25.4% over the best baseline, while reducing the average prompt length by 46.4%. These results demonstrate the benefits of agent-based schema-aware reasoning for scalable KGQA and support decision-making in sustainability domains through robust cross-graph reasoning.


AutoSchemaKG: Autonomous Knowledge Graph Construction through Dynamic Schema Induction from Web-Scale Corpora

arXiv.org Artificial Intelligence

We present AutoSchemaKG, a framework for fully autonomous knowledge graph construction that eliminates the need for predefined schemas. Our system leverages large language models to simultaneously extract knowledge triples and induce comprehensive schemas directly from text, modeling both entities and events while employing conceptualization to organize instances into semantic categories. Processing over 50 million documents, we construct ATLAS (Automated Triple Linking And Schema induction), a family of knowledge graphs with 900+ million nodes and 5.9 billion edges. This approach outperforms state-of-the-art baselines on multi-hop QA tasks and enhances LLM factuality. Notably, our schema induction achieves 92\% semantic alignment with human-crafted schemas with zero manual intervention, demonstrating that billion-scale knowledge graphs with dynamically induced schemas can effectively complement parametric knowledge in large language models.


Exploring In-Context Learning for Frame-Semantic Parsing

arXiv.org Artificial Intelligence

Frame Semantic Parsing (FSP) entails identifying predicates and labeling their arguments according to Frame Semantics. This paper investigates the use of In-Context Learning (ICL) with Large Language Models (LLMs) to perform FSP without model fine-tuning. We propose a method that automatically generates task-specific prompts for the Frame Identification (FI) and Frame Semantic Role Labeling (FSRL) subtasks, relying solely on the FrameNet database. These prompts, constructed from frame definitions and annotated examples, are used to guide six different LLMs. Experiments are conducted on a subset of frames related to violent events. The method achieves competitive results, with F1 scores of 94.3% for FI and 77.4% for FSRL. The findings suggest that ICL offers a practical and effective alternative to traditional fine-tuning for domain-specific FSP tasks.


Unifying Post-hoc Explanations of Knowledge Graph Completions

arXiv.org Artificial Intelligence

Post-hoc explainability for Knowledge Graph Completion (KGC) lacks formalization and consistent evaluations, hindering reproducibility and cross-study comparisons. This paper argues for a unified approach to post-hoc explainability in KGC. First, we propose a general framework to characterize post-hoc explanations via multi-objective optimization, balancing their effectiveness and conciseness. This unifies existing post-hoc explainability algorithms in KGC and the explanations they produce. Next, we suggest and empirically support improved evaluation protocols using popular metrics like Mean Reciprocal Rank and Hits@ k . Finally, we stress the importance of interpretability as the ability of explanations to address queries meaningful to end-users. By unifying methods and refining evaluation standards, this work aims to make research in KGC explainability more reproducible and impactful.


Full Triple Matcher: Integrating all triple elements between heterogeneous Knowledge Graphs

arXiv.org Artificial Intelligence

Knowledge graphs (KGs) are powerful tools for representing and reasoning over structured information. Their main components include schema, identity, and context. While schema and identity matching are well-established in ontology and entity matching research, context matching remains largely unexplored. This is particularly important because real-world KGs often vary significantly in source, size, and information density - factors not typically represented in the datasets on which current entity matching methods are evaluated. As a result, existing approaches may fall short in scenarios where diverse and complex contexts need to be integrated. To address this gap, we propose a novel KG integration method consisting of label matching and triple matching. We use string manipulation, fuzzy matching, and vector similarity techniques to align entity and predicate labels. Next, we identify mappings between triples that convey comparable information, using these mappings to improve entity-matching accuracy. Our approach demonstrates competitive performance compared to leading systems in the OAEI competition and against supervised methods, achieving high accuracy across diverse test cases. Additionally, we introduce a new dataset derived from the benchmark dataset to evaluate the triple-matching step more comprehensively.


HypKG: Hypergraph-based Knowledge Graph Contextualization for Precision Healthcare

arXiv.org Artificial Intelligence

Knowledge graphs (KGs) are important products of the semantic web, which are widely used in various application domains. Healthcare is one of such domains where KGs are intensively used, due to the high requirement for knowledge accuracy and interconnected nature of healthcare data. However, KGs storing general factual information often lack the ability to account for important contexts of the knowledge such as the status of specific patients, which are crucial in precision healthcare. Meanwhile, electronic health records (EHRs) provide rich personal data, including various diagnoses and medications, which provide natural contexts for general KGs. In this paper, we propose HypKG, a framework that integrates patient information from EHRs into KGs to generate contextualized knowledge representations for accurate healthcare predictions. Using advanced entity-linking techniques, we connect relevant knowledge from general KGs with patient information from EHRs, and then utilize a hypergraph model to "contextualize" the knowledge with the patient information. Finally, we employ hypergraph transformers guided by downstream prediction tasks to jointly learn proper con-textualized representations for both KGs and patients, fully leveraging existing knowledge in KGs and patient contexts in EHRs. In experiments using a large biomedical KG and two real-world EHR datasets, HypKG demonstrates significant improvements in healthcare prediction tasks across multiple evaluation metrics. Additionally, by integrating external contexts, HypKG can learn to adjust the representations of entities and relations in KG, potentially improving the quality and real-world utility of knowledge.


Learning Attention-based Representations from Multiple Patterns for Relation Prediction in Knowledge Graphs

arXiv.org Artificial Intelligence

Knowledge bases, and their representations in the form of knowledge graphs (KGs), are naturally incomplete. Since scientific and industrial applications have extensively adopted them, there is a high demand for solutions that complete their information. Several recent works tackle this challenge by learning embeddings for entities and relations, then employing them to predict new relations among the entities. Despite their aggrandizement, most of those methods focus only on the local neighbors of a relation to learn the embeddings. As a result, they may fail to capture the KGs' context information by neglecting long-term dependencies and the propagation of entities' semantics. In this manuscript, we propose ร†MP (Attention-based Embeddings from Multiple Patterns), a novel model for learning contextualized representations by: (i) acquiring entities' context information through an attention-enhanced message-passing scheme, which captures the entities' local semantics while focusing on different aspects of their neighborhood; and (ii) capturing the semantic context, by leveraging the paths and their relationships between entities. Our empirical findings draw insights into how attention mechanisms can improve entities' context representation and how combining entities and semantic path contexts improves the general representation of entities and the relation predictions. Experimental results on several large and small knowledge graph benchmarks show that ร†MP either outperforms or competes with state-of-the-art relation prediction methods.


Complementarity-driven Representation Learning for Multi-modal Knowledge Graph Completion

arXiv.org Artificial Intelligence

Multi-modal Knowledge Graph Completion (MMKGC) aims to uncover hidden world knowledge in multimodal knowledge graphs by leveraging both multimodal and structural entity information. However, the inherent imbalance in multimodal knowledge graphs, where modality distributions vary across entities, poses challenges in utilizing additional modality data for robust entity representation. Existing MMKGC methods typically rely on attention or gate-based fusion mechanisms but overlook complementarity contained in multi-modal data. In this paper, we propose a novel framework named Mixture of Complementary Modality Experts (MoCME), which consists of a Complementarity-guided Modality Knowledge Fusion (CMKF) module and an Entropy-guided Negative Sampling (EGNS) mechanism. Additionally, we introduce an Entropy-guided Negative Sampling mechanism to dynamically prioritize informative and uncertain negative samples to enhance training effectiveness and model robustness. Extensive experiments on five benchmark datasets demonstrate that our MoCME achieves state-of-the-art performance, surpassing existing approaches. Introduction Knowledge graphs (KGs) [1, 2, 3, 4, 5] model real-world knowledge through structured representations in the form of triples--comprising a head entity, a relation, and a tail entity--which are typically constructed manually based on existing databases. However, the inherent incompleteness of KGs [6, 7], coupled with the high cost of annotating factual triples, has given rise to the task of Knowledge Graph Completion (KGC), which aims to predict and infer missing but plausible triples within an existing knowledge graph. Conventional KGC methods [1, 2, 3, 4] predominantly rely on Knowledge Graph Embedding (KGE) techniques, where entities and relations are embedded into continuous vector spaces to learn structural representations that model the relational patterns of triples and assess their plausibility .


Unlearning of Knowledge Graph Embedding via Preference Optimization

arXiv.org Artificial Intelligence

Existing knowledge graphs (KGs) inevitably contain outdated or erroneous knowledge that needs to be removed from knowledge graph embedding (KGE) models. To address this challenge, knowledge unlearning can be applied to eliminate specific information while preserving the integrity of the remaining knowledge in KGs. Existing unlearning methods can generally be categorized into exact unlearning and approximate unlearning. However, exact unlearning requires high training costs while approximate unlearning faces two issues when applied to KGs due to the inherent connectivity of triples: (1) It fails to fully remove targeted information, as forgetting triples can still be inferred from remaining ones. (2) It focuses on local data for specific removal, which weakens the remaining knowledge in the forgetting boundary. To address these issues, we propose GraphDPO, a novel approximate unlearning framework based on direct preference optimization (DPO). Firstly, to effectively remove forgetting triples, we reframe unlearning as a preference optimization problem, where the model is trained by DPO to prefer reconstructed alternatives over the original forgetting triples. This formulation penalizes reliance on forgettable knowledge, mitigating incomplete forgetting caused by KG connectivity. Moreover, we introduce an out-boundary sampling strategy to construct preference pairs with minimal semantic overlap, weakening the connection between forgetting and retained knowledge. Secondly, to preserve boundary knowledge, we introduce a boundary recall mechanism that replays and distills relevant information both within and across time steps. We construct eight unlearning datasets across four popular KGs with varying unlearning rates. Experiments show that GraphDPO outperforms state-of-the-art baselines by up to 10.1% in MRR_Avg and 14.0% in MRR_F1.


mRAKL: Multilingual Retrieval-Augmented Knowledge Graph Construction for Low-Resourced Languages

arXiv.org Artificial Intelligence

Knowledge Graphs represent real-world entities and the relationships between them. Multilingual Knowledge Graph Construction (mKGC) refers to the task of automatically constructing or predicting missing entities and links for knowledge graphs in a multilingual setting. In this work, we reformulate the mKGC task as a Question Answering (QA) task and introduce mRAKL: a Retrieval-Augmented Generation (RAG) based system to perform mKGC. We achieve this by using the head entity and linking relation in a question, and having our model predict the tail entity as an answer. Our experiments focus primarily on two low-resourced languages: Tigrinya and Amharic. We experiment with using higher-resourced languages Arabic and English for cross-lingual transfer. With a BM25 retriever, we find that the RAG-based approach improves performance over a no-context setting. Further, our ablation studies show that with an idealized retrieval system, mRAKL improves accuracy by 4.92 and 8.79 percentage points for Tigrinya and Amharic, respectively.