Ontologies
Can Structured Data Reduce Epistemic Uncertainty?
S, Shriram M, S, Sushmitha, S, Gayathri K, A, Shahina
One of the main issues with the current In the current era of Large Language Models (LLMs), with retrieval approaches using Retrieval-Augmented Generation an abundance of data, there is always a tricky question to is hallucination, where the model gives out irrelevant, be addressed: Is providing an abundance of data enough to incorrect, and unreal responses. By incorporating subsumptions solve complex tasks? The majority of modern-day models in the prompt, we ensure hallucination is minimized are fundamentally probabilistic, which though highly powerful and the response of the Language Model is more contextually in its way, gives the model only an uncertain output and factually intact. Section 4 presents key insights that cannot be reasoned out. This uncertainty is of 2 from our experimentation with ontologies in the medical domain, types, epistemic (EU) and aleatoric (AU), where the former demonstrating how our methodology could be used is also called reducible uncertainty, caused due to the lack of for quicker training and reducing hallucinations in LLMs.
Using off-the-shelf LLMs to query enterprise data by progressively revealing ontologies
Civili, C., Sherkhonov, E., Stirewalt, R. E. K.
Using Large Language Models (LLMs) to generate database queries is an area of active research. In [4], Sequeda et al. argue that knowledge graphs (KGs) with rich ontologies can enable an LLM to answer queries of enterprise complexity, noting that text-to-SQL benchmarks such as Spider [6] are not tailored to such queries. In addition to query complexity, an equally challenging problem in the enterprise setting is schema complexity, where the ontology itself is large and complex. This paper contributes an approach to using off-the-shelf LLMs and enterprise-scale ontologies to answer natural language questions on large data sets. We address the schema complexity problem by incrementally revealing "just enough" of an ontology that is needed to answer a given question.
Continual Learning with Evolving Class Ontologies
Lifelong learners must recognize concept vocabularies that evolve over time. A common yet underexplored scenario is learning with class labels that continually refine/expand old classes. For example, humans learn to recognize {\tt dog} before dog breeds. In practical settings, dataset {\it versioning} often introduces refinement to ontologies, such as autonomous vehicle benchmarks that refine a previous {\tt vehicle} class into {\tt school-bus} as autonomous operations expand to new cities. This paper formalizes a protocol for studying the problem of {\it Learning with Evolving Class Ontology} (LECO).
Towards Assurance of LLM Adversarial Robustness using Ontology-Driven Argumentation
Momcilovic, Tomas Bueno, Buesser, Beat, Zizzo, Giulio, Purcell, Mark, Balta, Dian
Despite the impressive adaptability of large language models (LLMs), challenges remain in ensuring their security, transparency, and interpretability. Given their susceptibility to adversarial attacks, LLMs need to be defended with an evolving combination of adversarial training and guardrails. However, managing the implicit and heterogeneous knowledge for continuously assuring robustness is difficult. We introduce a novel approach for assurance of the adversarial robustness of LLMs based on formal argumentation. Using ontologies for formalization, we structure state-of-the-art attacks and defenses, facilitating the creation of a human-readable assurance case, and a machine-readable representation. We demonstrate its application with examples in English language and code translation tasks, and provide implications for theory and practice, by targeting engineers, data scientists, users, and auditors.
Improving the portability of predicting students performance models by using ontologies
Zambrano, Javier Lopez, Lara, Juan A., Romero, Cristobal
One of the main current challenges in Educational Data Mining and Learning Analytics is the portability or transferability of predictive models obtained for a particular course so that they can be applied to other different courses. To handle this challenge, one of the foremost problems is the models excessive dependence on the low-level attributes used to train them, which reduces the models portability. To solve this issue, the use of high level attributes with more semantic meaning, such as ontologies, may be very useful. Along this line, we propose the utilization of an ontology that uses a taxonomy of actions that summarises students interactions with the Moodle learning management system. We compare the results of this proposed approach against our previous results when we used low-level raw attributes obtained directly from Moodle logs. The results indicate that the use of the proposed ontology improves the portability of the models in terms of predictive accuracy. The main contribution of this paper is to show that the ontological models obtained in one source course can be applied to other different target courses with similar usage levels without losing prediction accuracy.
Bottom-up Anytime Discovery of Generalised Multimodal Graph Patterns for Knowledge Graphs
Wilcke, Xander, Mourits, Rick, Rijpma, Auke, Zijdeman, Richard
Vast amounts of heterogeneous knowledge are becoming publicly available in the form of knowledge graphs, often linking multiple sources of data that have never been together before, and thereby enabling scholars to answer many new research questions. It is often not known beforehand, however, which questions the data might have the answers to, potentially leaving many interesting and novel insights to remain undiscovered. To support scholars during this scientific workflow, we introduce an anytime algorithm for the bottom-up discovery of generalized multimodal graph patterns in knowledge graphs. Each pattern is a conjunction of binary statements with (data-) type variables, constants, and/or value patterns. Upon discovery, the patterns are converted to SPARQL queries and presented in an interactive facet browser together with metadata and provenance information, enabling scholars to explore, analyse, and share queries. We evaluate our method from a user perspective, with the help of domain experts in the humanities.
Open Digital Rights Enforcement Framework (ODRE): from descriptive to enforceable policies
Cimmino, Andrea, Cano-Benito, Juan, García-Castro, Raúl
From centralised platforms to decentralised ecosystems, like Data Spaces, sharing data has become a paramount challenge. For this reason, the definition of data usage policies has become crucial in these domains, highlighting the necessity of effective policy enforcement mechanisms. The Open Digital Rights Language (ODRL) is a W3C standard ontology designed to describe data usage policies, however, it lacks built-in enforcement capabilities, limiting its practical application. This paper introduces the Open Digital Rights Enforcement (ODRE) framework, whose goal is to provide ODRL with enforcement capabilities. The ODRE framework proposes a novel approach to express ODRL policies that integrates the descriptive ontology terms of ODRL with other languages that allow behaviour specification, such as dynamic data handling or function evaluation. The framework includes an enforcement algorithm for ODRL policies and two open-source implementations in Python and Java. The ODRE framework is also designed to support future extensions of ODRL to specific domain scenarios. In addition, current limitations of ODRE, ODRL, and current challenges are reported. Finally, to demonstrate the enforcement capabilities of the implementations, their performance, and their extensibility features, several experiments have been carried out with positive results.
Unveiling Ontological Commitment in Multi-Modal Foundation Models
Keser, Mert, Schwalbe, Gesina, Amini-Naieni, Niki, Rottmann, Matthias, Knoll, Alois
Ontological commitment, i.e., used concepts, relations, and assumptions, are a corner stone of qualitative reasoning (QR) models. The state-of-the-art for processing raw inputs, though, are deep neural networks (DNNs), nowadays often based off from multimodal foundation models. These automatically learn rich representations of concepts and respective reasoning. Unfortunately, the learned qualitative knowledge is opaque, preventing easy inspection, validation, or adaptation against available QR models. So far, it is possible to associate pre-defined concepts with latent representations of DNNs, but extractable relations are mostly limited to semantic similarity. As a next step towards QR for validation and verification of DNNs: Concretely, we propose a method that extracts the learned superclass hierarchy from a multimodal DNN for a given set of leaf concepts. Under the hood we (1) obtain leaf concept embeddings using the DNN's textual input modality; (2) apply hierarchical clustering to them, using that DNNs encode semantic similarities via vector distances; and (3) label the such-obtained parent concepts using search in available ontologies from QR. An initial evaluation study shows that meaningful ontological class hierarchies can be extracted from state-of-the-art foundation models. Furthermore, we demonstrate how to validate and verify a DNN's learned representations against given ontologies. Lastly, we discuss potential future applications in the context of QR.
Probing Omissions and Distortions in Transformer-based RDF-to-Text Models
Faille, Juliette, Gatt, Albert, Gardent, Claire
In Natural Language Generation (NLG), important information is sometimes omitted in the output text. To better understand and analyse how this type of mistake arises, we focus on RDF-to-Text generation and explore two methods of probing omissions in the encoder output of BART (Lewis et al, 2020) and of T5 (Raffel et al, 2019): (i) a novel parameter-free probing method based on the computation of cosine similarity between embeddings of RDF graphs and of RDF graphs in which we removed some entities and (ii) a parametric probe which performs binary classification on the encoder embeddings to detect omitted entities. We also extend our analysis to distorted entities, i.e. entities that are not fully correctly mentioned in the generated text (e.g. misspelling of entity, wrong units of measurement). We found that both omitted and distorted entities can be probed in the encoder's output embeddings. This suggests that the encoder emits a weaker signal for these entities and therefore is responsible for some loss of information. This also shows that probing methods can be used to detect mistakes in the output of NLG models.
HeadCT-ONE: Enabling Granular and Controllable Automated Evaluation of Head CT Radiology Report Generation
Acosta, Julián N., Zhang, Xiaoman, Dogra, Siddhant, Zhou, Hong-Yu, Payabvash, Seyedmehdi, Falcone, Guido J., Oermann, Eric K., Rajpurkar, Pranav
We present Head CT Ontology Normalized Evaluation (HeadCT-ONE), a metric for evaluating head CT report generation through ontology-normalized entity and relation extraction. HeadCT-ONE enhances current information extraction derived metrics (such as RadGraph F1) by implementing entity normalization through domain-specific ontologies, addressing radiological language variability. HeadCT-ONE compares normalized entities and relations, allowing for controllable weighting of different entity types or specific entities. Through experiments on head CT reports from three health systems, we show that HeadCT-ONE's normalization and weighting approach improves the capture of semantically equivalent reports, better distinguishes between normal and abnormal reports, and aligns with radiologists' assessment of clinically significant errors, while offering flexibility to prioritize specific aspects of report content. Our results demonstrate how HeadCT-ONE enables more flexible, controllable, and granular automated evaluation of head CT reports.