Goto

Collaborating Authors

 Ontologies


Situated Ground Truths: Enhancing Bias-Aware AI by Situating Data Labels with SituAnnotate

arXiv.org Artificial Intelligence

In the contemporary world of AI and data-driven applications, supervised machines often derive their understanding, which they mimic and reproduce, through annotations--typically conveyed in the form of words or labels. However, such annotations are often divorced from or lack contextual information, and as such hold the potential to inadvertently introduce biases when subsequently used for training. This paper introduces SituAnnotate, a novel ontology explicitly crafted for 'situated grounding,' aiming to anchor the ground truth data employed in training AI systems within the contextual and culturally-bound situations from which those ground truths emerge. SituAnnotate offers an ontology-based approach to structured and context-aware data annotation, addressing potential bias issues associated with isolated annotations. Its representational power encompasses situational context, including annotator details, timing, location, remuneration schemes, annotation roles, and more, ensuring semantic richness. Aligned with the foundational Dolce Ultralight ontology, it provides a robust and consistent framework for knowledge representation. As a method to create, query, and compare label-based datasets, SituAnnotate empowers downstream AI systems to undergo training with explicit consideration of context and cultural bias, laying the groundwork for enhanced system interpretability and adaptability, and enabling AI models to align with a multitude of cultural contexts and viewpoints.


Coupling Machine Learning with Ontology for Robotics Applications

arXiv.org Artificial Intelligence

In this paper I present a practical approach for coupling machine learning (ML) algorithms with knowledge bases (KB) ontology formalism. The lack of availability of prior knowledge in dynamic scenarios is without doubt a major barrier for scalable machine intelligence. My view of the interaction between the two tiers intelligence is based on the idea that when knowledge is not readily available at the knowledge base tier, more knowledge can be extracted from the other tier, which has access to trained models from machine learning algorithms. My analysis shows that the two-tiers intelligence approach for coupling ML and KB is computationally valid and the time complexity of the algorithms during the robot mission is linear with the size of the data and knowledge. Key words: trust AI; machine learning; neural; symbolic systems 1. Introduction Trust in the reliability and resilience of autonomous systems is paramount to their continued growth, as well as their safe and effective utilization The ontology scope of these prior works varies, and it depends on the functionalities of the target robotic system, i.e. concepts that were modelled in the ontology are related to: object names, environment, affordance, action and task, activity and behaviour, plan and method, capability and skill, hardware components, software components, interaction, and communication This knowledge enabled architecture provides a means of sharing knowledge via the ontology, between different robots, and between different subsystems of a single robot's control system in a machine understandable and consistent presentation.


Constructive Interpolation and Concept-Based Beth Definability for Description Logics via Sequents

arXiv.org Artificial Intelligence

We introduce a constructive method applicable to a large number of description logics (DLs) for establishing the concept-based Beth definability property (CBP) based on sequent systems. Using the highly expressive DL RIQ as a case study, we introduce novel sequent calculi for RIQ-ontologies and show how certain interpolants can be computed from sequent calculus proofs, which permit the extraction of explicit definitions of implicitly definable concepts. To the best of our knowledge, this is the first sequent-based approach to computing interpolants and definitions within the context of DLs, as well as the first proof that RIQ enjoys the CBP. Moreover, due to the modularity of our sequent systems, our results hold for any restriction of RIQ, and are applicable to other DLs by suitable modifications.


Digital twins in sport: Concepts, Taxonomies, Challenges and Practical Potentials

arXiv.org Artificial Intelligence

Digital twins belong to ten of the strategic technology trends according to the Gartner list from 2019, and have encountered a big expansion, especially with the introduction of Industry 4.0. Sport, on the other hand, has become a constant companion of the modern human suffering a lack of a healthy way of life. The application of digital twins in sport has brought dramatic changes not only in the domain of sport training, but also in managing athletes during competitions, searching for strategical solutions before and tactical solutions during the games by coaches. In this paper, the domain of digital twins in sport is reviewed based on papers which have emerged in this area. At first, the concept of a digital twin is discussed in general. Then, taxonomies of digital twins are appointed. According to these taxonomies, the collection of relevant papers is analyzed, and some real examples of digital twins are exposed. The review finishes with a discussion about how the digital twins affect changes in the modern sport disciplines, and what challenges and opportunities await the digital twins in the future.


Qabas: An Open-Source Arabic Lexicographic Database

arXiv.org Artificial Intelligence

We present Qabas, a novel open-source Arabic lexicon designed for NLP applications. The novelty of Qabas lies in its synthesis of 110 lexicons. Specifically, Qabas lexical entries (lemmas) are assembled by linking lemmas from 110 lexicons. Furthermore, Qabas lemmas are also linked to 12 morphologically annotated corpora (about 2M tokens), making it the first Arabic lexicon to be linked to lexicons and corpora. Qabas was developed semi-automatically, utilizing a mapping framework and a web-based tool. Compared with other lexicons, Qabas stands as the most extensive Arabic lexicon, encompassing about 58K lemmas (45K nominal lemmas, 12.5K verbal lemmas, and 473 functional-word lemmas). Qabas is open-source and accessible online at https://sina.birzeit.edu/qabas.


RAM-EHR: Retrieval Augmentation Meets Clinical Predictions on Electronic Health Records

arXiv.org Artificial Intelligence

We present RAM-EHR, a Retrieval AugMentation pipeline to improve clinical predictions on Electronic Health Records (EHRs). RAM-EHR first collects multiple knowledge sources, converts them into text format, and uses dense retrieval to obtain information related to medical concepts. This strategy addresses the difficulties associated with complex names for the concepts. RAM-EHR then augments the local EHR predictive model co-trained with consistency regularization to capture complementary information from patient visits and summarized knowledge. Experiments on two EHR datasets show the efficacy of RAM-EHR over previous knowledge-enhanced baselines (3.4% gain in AUROC and 7.2% gain in AUPR), emphasizing the effectiveness of the summarized knowledge from RAM-EHR for clinical prediction tasks. The code will be published at \url{https://github.com/ritaranx/RAM-EHR}.


Towards an ontology of portions of matter to support multi-scale analysis and provenance tracking

arXiv.org Artificial Intelligence

This paper presents an ontology of portions of matter with practical implications across scientific and industrial domains. The ontology is developed under the Unified Foundational Ontology (UFO), which uses the concept of quantity to represent topologically maximally self-connected portions of matter. The proposed ontology introduces the granuleOf parthood relation, holding between objects and portions of matter. It also discusses the constitution of quantities by collections of granules, the representation of sub-portions of matter, and the tracking of matter provenance between quantities using historical relations. Lastly, a case study is presented to demonstrate the use of the portion of matter ontology in the geology domain for an Oil & Gas industry application. In the case study, we model how to represent the historical relation between an original portion of rock and the sub-portions created during the industrial process. Lastly, future research directions are outlined, including investigating granularity levels and defining a taxonomy of events.


KNOW: A Real-World Ontology for Knowledge Capture with Large Language Models

arXiv.org Artificial Intelligence

We present KNOW--the Knowledge Navigator Ontology for the World--the first ontology designed to capture everyday knowledge to augment large language models (LLMs) in real-world generative AI use cases such as personal AI assistants. Our domain is human life, both its everyday concerns and its major milestones. We have limited the initial scope of the modeled concepts to only established human universals: spacetime (places, events) plus social (people, groups, organizations). The inclusion criteria for modeled concepts are pragmatic, beginning with universality and utility. We compare and contrast previous work such as Schema.org and Cyc--as well as attempts at a synthesis of knowledge graphs and language models--noting how LLMs already encode internally much of the commonsense tacit knowledge that took decades to capture in the Cyc project. We also make available code-generated software libraries for the 12 most popular programming languages, enabling the direct use of ontology concepts in software engineering. We emphasize simplicity and developer experience in promoting AI interoperability.


The Impact of Ontology on the Prediction of Cardiovascular Disease Compared to Machine Learning Algorithms

arXiv.org Artificial Intelligence

Cardiovascular disease is one of the chronic diseases that is on the rise. The complications occur when cardiovascular disease is not discovered early and correctly diagnosed at the right time. Various machine learning approaches, including ontology-based Machine Learning techniques, have lately played an essential role in medical science by building an automated system that can identify heart illness. This paper compares and reviews the most prominent machine learning algorithms, as well as ontology-based Machine Learning classification. Random Forest, Logistic regression, Decision Tree, Naive Bayes, k-Nearest Neighbours, Artificial Neural Network, and Support Vector Machine were among the classification methods explored. The dataset used consists of 70000 instances and can be downloaded from the Kaggle website. The findings are assessed using performance measures generated from the confusion matrix, such as F-Measure, Accuracy, Recall, and Precision. The results showed that the ontology outperformed all the machine learning algorithms.


Towards Ontology-Enhanced Representation Learning for Large Language Models

arXiv.org Artificial Intelligence

Taking advantage of the widespread use of ontologies to organise and harmonize knowledge across several distinct domains, this paper proposes a novel approach to improve an embedding-Large Language Model (embedding-LLM) of interest by infusing the knowledge formalized by a reference ontology: ontological knowledge infusion aims at boosting the ability of the considered LLM to effectively model the knowledge domain described by the infused ontology. The linguistic information (i.e. concept synonyms and descriptions) and structural information (i.e. is-a relations) formalized by the ontology are utilized to compile a comprehensive set of concept definitions, with the assistance of a powerful generative LLM (i.e. GPT-3.5-turbo). These concept definitions are then employed to fine-tune the target embedding-LLM using a contrastive learning framework. To demonstrate and evaluate the proposed approach, we utilize the biomedical disease ontology MONDO. The results show that embedding-LLMs enhanced by ontological disease knowledge exhibit an improved capability to effectively evaluate the similarity of in-domain sentences from biomedical documents mentioning diseases, without compromising their out-of-domain performance.