Goto

Collaborating Authors

 subsumption




Language Models as Ontology Encoders

Yang, Hui, Chen, Jiaoyan, He, Yuan, Gao, Yongsheng, Horrocks, Ian

arXiv.org Artificial Intelligence

OWL (Web Ontology Language) ontologies which are able to formally represent complex knowledge and support semantic reasoning have been widely adopted across various domains such as healthcare and bioinformatics. Recently, ontology embeddings have gained wide attention due to its potential to infer plausible new knowledge and approximate complex reasoning. However, existing methods face notable limitations: geometric model-based embeddings typically overlook valuable textual information, resulting in suboptimal performance, while the approaches that incorporate text, which are often based on language models, fail to preserve the logical structure. In this work, we propose a new ontology embedding method OnT, which tunes a Pretrained Language Model (PLM) via geometric modeling in a hyperbolic space for effectively incorporating textual labels and simultaneously preserving class hierarchies and other logical relationships of Description Logic EL. Extensive experiments on four real-world ontologies show that OnT consistently outperforms the baselines including the state-of-the-art across both tasks of prediction and inference of axioms. OnT also demonstrates strong potential in real-world applications, indicated by its robust transfer learning abilities and effectiveness in real cases of constructing a new ontology from SNOMED CT. Data and code are available at https://github.com/HuiYang1997/OnT.


VEL: A Formally Verified Reasoner for OWL2 EL Profile

Ileri, Atalay Mert, Rangarajan, Nalen, Cannell, Jack, McGinty, Hande

arXiv.org Artificial Intelligence

Over the past two decades, the Web Ontology Language (OWL) has been instrumental in advancing the development of ontologies and knowledge graphs, providing a structured framework that enhances the semantic integration of data. However, the reliability of deductive reasoning within these systems remains challenging, as evidenced by inconsistencies among popular reasoners in recent competitions. This evidence underscores the limitations of current testing-based methodologies, particularly in high-stakes domains such as healthcare. To mitigate these issues, in this paper, we have developed VEL, a formally verified EL++ reasoner equipped with machine-checkable correctness proofs that ensure the validity of outputs across all possible inputs. This formalization, based on the algorithm of Baader et al., has been transformed into executable OCaml code using the Coq proof assistant's extraction capabilities. Our formalization revealed several errors in the original completeness proofs, which led to changes to the algorithm to ensure its completeness. Our work demonstrates the necessity of mechanization of reasoning algorithms to ensure their correctness at theoretical and implementation levels.


Can Structured Data Reduce Epistemic Uncertainty?

S, Shriram M, S, Sushmitha, S, Gayathri K, A, Shahina

arXiv.org Artificial Intelligence

One of the main issues with the current In the current era of Large Language Models (LLMs), with retrieval approaches using Retrieval-Augmented Generation an abundance of data, there is always a tricky question to is hallucination, where the model gives out irrelevant, be addressed: Is providing an abundance of data enough to incorrect, and unreal responses. By incorporating subsumptions solve complex tasks? The majority of modern-day models in the prompt, we ensure hallucination is minimized are fundamentally probabilistic, which though highly powerful and the response of the Language Model is more contextually in its way, gives the model only an uncertain output and factually intact. Section 4 presents key insights that cannot be reasoned out. This uncertainty is of 2 from our experimentation with ontologies in the medical domain, types, epistemic (EU) and aleatoric (AU), where the former demonstrating how our methodology could be used is also called reducible uncertainty, caused due to the lack of for quicker training and reducing hallucinations in LLMs.


Can Large Language Models Understand DL-Lite Ontologies? An Empirical Study

Wang, Keyu, Qi, Guilin, Li, Jiaqi, Zhai, Songlin

arXiv.org Artificial Intelligence

Large language models (LLMs) have shown significant achievements in solving a wide range of tasks. Recently, LLMs' capability to store, retrieve and infer with symbolic knowledge has drawn a great deal of attention, showing their potential to understand structured information. However, it is not yet known whether LLMs can understand Description Logic (DL) ontologies. In this work, we empirically analyze the LLMs' capability of understanding DL-Lite ontologies covering 6 representative tasks from syntactic and semantic aspects. With extensive experiments, we demonstrate both the effectiveness and limitations of LLMs in understanding DL-Lite ontologies. We find that LLMs can understand formal syntax and model-theoretic semantics of concepts and roles. However, LLMs struggle with understanding TBox NI transitivity and handling ontologies with large ABoxes. We hope that our experiments and analyses provide more insights into LLMs and inspire to build more faithful knowledge engineering solutions.


Language Models as Hierarchy Encoders

He, Yuan, Yuan, Zhangdie, Chen, Jiaoyan, Horrocks, Ian

arXiv.org Artificial Intelligence

Interpreting hierarchical structures latent in language is a key limitation of current language models (LMs). While previous research has implicitly leveraged these hierarchies to enhance LMs, approaches for their explicit encoding are yet to be explored. To address this, we introduce a novel approach to re-train transformer encoder-based LMs as Hierarchy Transformer encoders (HiTs), harnessing the expansive nature of hyperbolic space. Our method situates the output embedding space of pre-trained LMs within a Poincar\'e ball with a curvature that adapts to the embedding dimension, followed by re-training on hyperbolic cluster and centripetal losses. These losses are designed to effectively cluster related entities (input as texts) and organise them hierarchically. We evaluate HiTs against pre-trained and fine-tuned LMs, focusing on their capabilities in simulating transitive inference, predicting subsumptions, and transferring knowledge across hierarchies. The results demonstrate that HiTs consistently outperform both pre-trained and fine-tuned LMs in these tasks, underscoring the effectiveness and transferability of our re-trained hierarchy encoders.


Generalisation Through Negation and Predicate Invention

Cerna, David M., Cropper, Andrew

arXiv.org Artificial Intelligence

The ability to generalise from a small number of examples is a fundamental challenge in machine learning. To tackle this challenge, we introduce an inductive logic programming (ILP) approach that combines negation and predicate invention. Combining these two features allows an ILP system to generalise better by learning rules with universally quantified body-only variables. We implement our idea in NOPI, which can learn normal logic programs with predicate invention, including Datalog programs with stratified negation. Our experimental results on multiple domains show that our approach can improve predictive accuracies and learning times.


DeepOnto: A Python Package for Ontology Engineering with Deep Learning

He, Yuan, Chen, Jiaoyan, Dong, Hang, Horrocks, Ian, Allocca, Carlo, Kim, Taehun, Sapkota, Brahmananda

arXiv.org Artificial Intelligence

Applying deep learning techniques, particularly language models (LMs), in ontology engineering has raised widespread attention. However, deep learning frameworks like PyTorch and Tensorflow are predominantly developed for Python programming, while widely-used ontology APIs, such as the OWL API and Jena, are primarily Java-based. To facilitate seamless integration of these frameworks and APIs, we present Deeponto, a Python package designed for ontology engineering. The package encompasses a core ontology processing module founded on the widely-recognised and reliable OWL API, encapsulating its fundamental features in a more "Pythonic" manner and extending its capabilities to include other essential components including reasoning, verbalisation, normalisation, projection, and more. Building on this module, Deeponto offers a suite of tools, resources, and algorithms that support various ontology engineering tasks, such as ontology alignment and completion, by harnessing deep learning methodologies, primarily pre-trained LMs. In this paper, we also demonstrate the practical utility of Deeponto through two use-cases: the Digital Health Coaching in Samsung Research UK and the Bio-ML track of the Ontology Alignment Evaluation Initiative (OAEI).


Contextual Semantic Embeddings for Ontology Subsumption Prediction

Chen, Jiaoyan, He, Yuan, Geng, Yuxia, Jimenez-Ruiz, Ernesto, Dong, Hang, Horrocks, Ian

arXiv.org Artificial Intelligence

Automating ontology construction and curation is an important but challenging task in knowledge engineering and artificial intelligence. Prediction by machine learning techniques such as contextual semantic embedding is a promising direction, but the relevant research is still preliminary especially for expressive ontologies in Web Ontology Language (OWL). In this paper, we present a new subsumption prediction method named BERTSubs for classes of OWL ontology. It exploits the pre-trained language model BERT to compute contextual embeddings of a class, where customized templates are proposed to incorporate the class context (e.g., neighbouring classes) and the logical existential restriction. BERTSubs is able to predict multiple kinds of subsumers including named classes from the same ontology or another ontology, and existential restrictions from the same ontology. Extensive evaluation on five real-world ontologies for three different subsumption tasks has shown the effectiveness of the templates and that BERTSubs can dramatically outperform the baselines that use (literal-aware) knowledge graph embeddings, non-contextual word embeddings and the state-of-the-art OWL ontology embeddings.