Ontologies
Semantic relatedness in DBpedia: A comparative and experimental assessment
Formica, Anna, Taglino, Francesco
Evaluating semantic relatedness of Web resources is still an open challenge. This paper focuses on knowledge-based methods, which represent an alternative to corpus-based approaches, and rely in general on the availability of knowledge graphs. In particular, we have selected 10 methods from the existing literature, that have been organized according to it adjacent resources, triple patterns, and triple weights-based methods. They have been implemented and evaluated by using DBpedia as reference RDF knowledge graph. Since DBpedia is continuously evolving, the experimental results provided by these methods in the literature are not comparable. For this reason, in this work, such methods have been experimented by running them all at once on the same DBpedia release and against 14 well-known golden datasets. On the basis of the correlation values with human judgment obtained according to the experimental results, weighting the RDF triples in combination with evaluating all the directed paths linking the compared resources is the best strategy in order to compute semantic relatedness in DBpedia.
Document Automation Architectures: Updated Survey in Light of Large Language Models
Achachlouei, Mohammad Ahmadi, Patil, Omkar, Joshi, Tarun, Nair, Vijayan N.
This paper surveys the current state of the art in document automation (DA). The objective of DA is to reduce the manual effort during the generation of documents by automatically creating and integrating input from different sources and assembling documents conforming to defined templates. There have been reviews of commercial solutions of DA, particularly in the legal domain, but to date there has been no comprehensive review of the academic research on DA architectures and technologies. The current survey of DA reviews the academic literature and provides a clearer definition and characterization of DA and its features, identifies state-of-the-art DA architectures and technologies in academic research, and provides ideas that can lead to new research opportunities within the DA field in light of recent advances in generative AI and large language models.
Conversational Ontology Alignment with ChatGPT
Norouzi, Sanaz Saki, Mahdavinejad, Mohammad Saeid, Hitzler, Pascal
This study evaluates the applicability and efficiency of ChatGPT for ontology alignment using a naive approach. ChatGPT's output is compared to the results of the Ontology Alignment Evaluation Initiative 2022 campaign using conference track ontologies. This comparison is intended to provide insights into the capabilities of a conversational large language model when used in a naive way for ontology matching, and to investigate the potential advantages and disadvantages of this approach.
PDPK: A Framework to Synthesise Process Data and Corresponding Procedural Knowledge for Manufacturing
Nordsieck, Richard, Schweizer, Andrรฉ, Heider, Michael, Hรคhner, Jรถrg
Procedural knowledge describes how to accomplish tasks and mitigate problems. Such knowledge is commonly held by domain experts, e.g. operators in manufacturing who adjust parameters to achieve quality targets. To the best of our knowledge, no real-world datasets containing process data and corresponding procedural knowledge are publicly available, possibly due to corporate apprehensions regarding the loss of knowledge advances. Therefore, we provide a framework to generate synthetic datasets that can be adapted to different domains. The design choices are inspired by two real-world datasets of procedural knowledge we have access to. Apart from containing representations of procedural knowledge in Resource Description Framework (RDF)-compliant knowledge graphs, the framework simulates parametrisation processes and provides consistent process data. We compare established embedding methods on the resulting knowledge graphs, detailing which out-of-the-box methods have the potential to represent procedural knowledge. This provides a baseline which can be used to increase the comparability of future work. Furthermore, we validate the overall characteristics of a synthesised dataset by comparing the results to those achievable on a real-world dataset. The framework and evaluation code, as well as the dataset used in the evaluation, are available open source.
Towards Ontology-Mediated Planning with OWL DL Ontologies (Extended Version)
John, Tobias, Koopmann, Patrick
While classical planning languages make the closed-domain and closed-world assumption, there have been various approaches to extend those with DL reasoning, which is then interpreted under the usual open-world semantics. Current approaches for planning with DL ontologies integrate the DL directly into the planning language, and practical approaches have been developed based on first-order rewritings or rewritings into datalog. We present here a new approach in which the planning specification and ontology are kept separate, and are linked together using an interface. This allows planning experts to work in a familiar formalism, while existing ontologies can be easily integrated and extended by ontology experts. Our approach for planning with those ontology-mediated planning problems is optimized for cases with comparatively small domains, and supports the whole OWL DL fragment. The idea is to rewrite the ontology-mediated planning problem into a classical planning problem to be processed by existing planning tools. Different to other approaches, our rewriting is data-dependent. A first experimental evaluation of our approach shows the potential and limitations of this approach.
Box$^2$EL: Concept and Role Box Embeddings for the Description Logic EL++
Jackermeier, Mathias, Chen, Jiaoyan, Horrocks, Ian
Description logic (DL) ontologies extend knowledge graphs (KGs) with conceptual information and logical background knowledge. In recent years, there has been growing interest in inductive reasoning techniques for such ontologies, which promise to complement classical deductive reasoning algorithms. Similar to KG completion, several existing approaches learn ontology embeddings in a latent space, while additionally ensuring that they faithfully capture the logical semantics of the underlying DL. However, they suffer from several shortcomings, mainly due to a limiting role representation. We propose Box$^2$EL, which represents both concepts and roles as boxes (i.e., axis-aligned hyperrectangles) and demonstrate how it overcomes the limitations of previous methods. We theoretically prove the soundness of our model and conduct an extensive experimental evaluation, achieving state-of-the-art results across a variety of datasets. As part of our evaluation, we introduce a novel benchmark for subsumption prediction involving both atomic and complex concepts.
Why Not? Explaining Missing Entailments with Evee (Technical Report)
Alrabbaa, Christian, Borgwardt, Stefan, Friese, Tom, Koopmann, Patrick, Kotlov, Mikhail
We present a Protรฉgรฉ plugin for explaining missing entailments from OWL ontologies. The importance of explaining description logic reasoning to end-users has long been understood, and has been studied in many forms over the past decades. Indeed, explainability is one of the main advantages of logic-based knowledge representations over sub-symbolic methods. The first approaches to explain why a consequence follows from a Description Logic (DL) ontology were based on step-by-step proofs [8, 18], but soon research focused on justifications [7, 11, 20] that are easier to compute, but still very useful for pointing out the axioms responsible for an entailment. Consequently, the ontology editor Protรฉgรฉ supports black-box methods for computing justifications for arbitrary OWL DL ontologies [12]. More recently, a series of papers investigated different methods of computing good proofs for entailments in DLs ranging from EL to ALCOI [13, 1, 2, 3], and the Protรฉgรฉ plug-ins proof-explanation [13] and Evee [4], as well as the web-based application Evonne [19], were developed to make these algorithms available to ontology engineers. While reasoning can sometimes reveal unexpected entailments that need explaining, very often the problem is not what is entailed, but what is not entailed. In order to explain such missing entailments, and offer suggestions on how to repair them, both counterexamples and abduction have been suggested in the literature.
Knowledge Prompt-tuning for Sequential Recommendation
Zhai, Jianyang, Zheng, Xiawu, Wang, Chang-Dong, Li, Hui, Tian, Yonghong
Pre-trained language models (PLMs) have demonstrated strong performance in sequential recommendation (SR), which are utilized to extract general knowledge. However, existing methods still lack domain knowledge and struggle to capture users' fine-grained preferences. Meanwhile, many traditional SR methods improve this issue by integrating side information while suffering from information loss. To summarize, we believe that a good recommendation system should utilize both general and domain knowledge simultaneously. Therefore, we introduce an external knowledge base and propose Knowledge Prompt-tuning for Sequential Recommendation (\textbf{KP4SR}). Specifically, we construct a set of relationship templates and transform a structured knowledge graph (KG) into knowledge prompts to solve the problem of the semantic gap. However, knowledge prompts disrupt the original data structure and introduce a significant amount of noise. We further construct a knowledge tree and propose a knowledge tree mask, which restores the data structure in a mask matrix form, thus mitigating the noise problem. We evaluate KP4SR on three real-world datasets, and experimental results show that our approach outperforms state-of-the-art methods on multiple evaluation metrics. Specifically, compared with PLM-based methods, our method improves NDCG@5 and HR@5 by \textcolor{red}{40.65\%} and \textcolor{red}{36.42\%} on the books dataset, \textcolor{red}{11.17\%} and \textcolor{red}{11.47\%} on the music dataset, and \textcolor{red}{22.17\%} and \textcolor{red}{19.14\%} on the movies dataset, respectively. Our code is publicly available at the link: \href{https://github.com/zhaijianyang/KP4SR}{\textcolor{blue}{https://github.com/zhaijianyang/KP4SR}.}
LeafAI: query generator for clinical cohort discovery rivaling a human programmer
Dobbins, Nicholas J, Han, Bin, Zhou, Weipeng, Lan, Kristine, Kim, H. Nina, Harrington, Robert, Uzuner, Ozlem, Yetisgen, Meliha
Objective: Identifying study-eligible patients within clinical databases is a critical step in clinical research. However, accurate query design typically requires extensive technical and biomedical expertise. We sought to create a system capable of generating data model-agnostic queries while also providing novel logical reasoning capabilities for complex clinical trial eligibility criteria. Materials and Methods: The task of query creation from eligibility criteria requires solving several text-processing problems, including named entity recognition and relation extraction, sequence-to-sequence transformation, normalization, and reasoning. We incorporated hybrid deep learning and rule-based modules for these, as well as a knowledge base of the Unified Medical Language System (UMLS) and linked ontologies. To enable data-model agnostic query creation, we introduce a novel method for tagging database schema elements using UMLS concepts. To evaluate our system, called LeafAI, we compared the capability of LeafAI to a human database programmer to identify patients who had been enrolled in 8 clinical trials conducted at our institution. We measured performance by the number of actual enrolled patients matched by generated queries. Results: LeafAI matched a mean 43% of enrolled patients with 27,225 eligible across 8 clinical trials, compared to 27% matched and 14,587 eligible in queries by a human database programmer. The human programmer spent 26 total hours crafting queries compared to several minutes by LeafAI. Conclusions: Our work contributes a state-of-the-art data model-agnostic query generation system capable of conditional reasoning using a knowledge base. We demonstrate that LeafAI can rival an experienced human programmer in finding patients eligible for clinical trials.
Decidability of Querying First-Order Theories via Countermodels of Finite Width
Feller, Thomas, Lyon, Tim S., Ostropolski-Nalewaja, Piotr, Rudolph, Sebastian
We propose a generic framework for establishing the decidability of a wide range of logical entailment problems (briefly called querying), based on the existence of countermodels that are structurally simple, gauged by certain types of width measures (with treewidth and cliquewidth as popular examples). As an important special case of our framework, we identify logics exhibiting width-finite finitely universal model sets, warranting decidable entailment for a wide range of homomorphism-closed queries, subsuming a diverse set of practically relevant query languages. As a particularly powerful width measure, we propose Blumensath's partitionwidth, which subsumes various other commonly considered width measures and exhibits highly favorable computational and structural properties. Focusing on the formalism of existential rules as a popular showcase, we explain how finite-partitionwidth sets of rules subsume other known abstract decidable classes but - leveraging existing notions of stratification - also cover a wide range of new rulesets. We expose natural limitations for fitting the class of finite-unification sets into our picture and provide several options for remedy.