Ontologies
Developing an Ontology for AI Act Fundamental Rights Impact Assessments
Rintamaki, Tytti, Pandit, Harshvardhan J.
The recently published EU Artificial Intelligence Act (AI Act) is a landmark regulation that regulates the use of AI technologies. One of its novel requirements is the obligation to conduct a Fundamental Rights Impact Assessment (FRIA), where organisations in the role of deployers must assess the risks of their AI system regarding health, safety, and fundamental rights. Another novelty in the AI Act is the requirement to create a questionnaire and an automated tool to support organisations in their FRIA obligations. Such automated tools will require a machine-readable form of information involved within the FRIA process, and additionally also require machine-readable documentation to enable further compliance tools to be created. In this article, we present our novel representation of the FRIA as an ontology based on semantic web standards. Our work builds upon the existing state of the art, notably the Data Privacy Vocabulary (DPV), where similar works have been established to create tools for GDPR's Data Protection Impact Assessments (DPIA) and other obligations. Through our ontology, we enable the creation and management of FRIA, and the use of automated tool in its various steps.
Clinical Trials Ontology Engineering with Large Language Models
Managing clinical trial information is currently a significant challenge for the medical industry, as traditional methods are both time-consuming and costly. This paper proposes a simple yet effective methodology to extract and integrate clinical trial data in a cost-effective and time-efficient manner. Allowing the medical industry to stay up-to-date with medical developments. Comparing time, cost, and quality of the ontologies created by humans, GPT3.5, GPT4, and Llama3 (8b & 70b). Findings suggest that large language models (LLM) are a viable option to automate this process both from a cost and time perspective. This study underscores significant implications for medical research where real-time data integration from clinical trials could become the norm.
The Multiplex Classification Framework: optimizing multi-label classifiers through problem transformation, ontology engineering, and model ensembling
Offidani, Mauro Nievas, Roffet, Facundo, Delrieux, Claudio Augusto, Galtier, Maria Carolina Gonzalez, Zarate, Marcos
Classification is a fundamental task in machine learning. While conventional methods--such as binary, multiclass, and multi-label classification--are effective for simpler problems, they may not adequately address the complexities of some real-world scenarios. This paper introduces the Multiplex Classification Framework, a novel approach developed to tackle these and similar challenges through the integration of problem transformation, ontology engineering, and model ensembling. The framework offers several advantages, including adaptability to any number of classes and logical constraints, an innovative method for managing class imbalance, the elimination of confidence threshold selection, and a modular structure. Two experiments were conducted to compare the performance of conventional classification models with the Multiplex approach. Our results demonstrate that the Multiplex approach can improve classification performance significantly (up to 10% gain in overall F1 score), particularly in classification problems with a large number of classes and pronounced class imbalances. However, it also has limitations, as it requires a thorough understanding of the problem domain and some experience with ontology engineering, and it involves training multiple models, which can make the whole process more intricate. Overall, this methodology provides a valuable tool for researchers and practitioners dealing with complex classification problems in machine learning.
Enhancing Rhetorical Figure Annotation: An Ontology-Based Web Application with RAG Integration
Kühn, Ramona, Mitrović, Jelena, Granitzer, Michael
Rhetorical figures play an important role in our communication. They are used to convey subtle, implicit meaning, or to emphasize statements. We notice them in hate speech, fake news, and propaganda. By improving the systems for computational detection of rhetorical figures, we can also improve tasks such as hate speech and fake news detection, sentiment analysis, opinion mining, or argument mining. Unfortunately, there is a lack of annotated data, as well as qualified annotators that would help us build large corpora to train machine learning models for the detection of rhetorical figures. The situation is particularly difficult in languages other than English, and for rhetorical figures other than metaphor, sarcasm, and irony. To overcome this issue, we develop a web application called "Find your Figure" that facilitates the identification and annotation of German rhetorical figures. The application is based on the German Rhetorical ontology GRhOOT which we have specially adapted for this purpose. In addition, we improve the user experience with Retrieval Augmented Generation (RAG). In this paper, we present the restructuring of the ontology, the development of the web application, and the built-in RAG pipeline. We also identify the optimal RAG settings for our application. Our approach is one of the first to practically use rhetorical ontologies in combination with RAG and shows promising results.
DODGE: Ontology-Aware Risk Assessment via Object-Oriented Disruption Graphs
Nicoletti, Stefano M., Hahn, E. Moritz, Fumagalli, Mattia, Guizzardi, Giancarlo, Stoelinga, Mariëlle
When considering risky events or actions, we must not downplay the role of involved objects: a charged battery in our phone averts the risk of being stranded in the desert after a flat tyre, and a functional firewall mitigates the risk of a hacker intruding the network. The Common Ontology of Value and Risk (COVER) highlights how the role of objects and their relationships remains pivotal to performing transparent, complete and accountable risk assessment. In this paper, we operationalize some of the notions proposed by COVER -- such as parthood between objects and participation of objects in events/actions -- by presenting a new framework for risk assessment: DODGE. DODGE enriches the expressivity of vetted formal models for risk -- i.e., fault trees and attack trees -- by bridging the disciplines of ontology and formal methods into an ontology-aware formal framework composed by a more expressive modelling formalism, Object-Oriented Disruption Graphs (ODGs), logic (ODGLog) and an intermediate query language (ODGLang). With these, DODGE allows risk assessors to pose questions about disruption propagation, disruption likelihood and risk levels, keeping the fundamental role of objects at risk always in sight.
Discerning and Characterising Types of Competency Questions for Ontologies
Keet, C. Maria, Khan, Zubeida Casmod
Competency Questions (CQs) are widely used in ontology development by guiding, among others, the scoping and validation stages. However, very limited guidance exists for formulating CQs and assessing whether they are good CQs, leading to issues such as ambiguity and unusable formulations. To solve this, one requires insight into the nature of CQs for ontologies and their constituent parts, as well as which ones are not. We aim to contribute to such theoretical foundations in this paper, which is informed by analysing questions, their uses, and the myriad of ontology development tasks. This resulted in a first Model for Competency Questions, which comprises five main types of CQs, each with a different purpose: Scoping (SCQ), Validating (VCQ), Foundational (FCQ), Relationship (RCQ), and Metaproperty (MpCQ) questions. This model enhances the clarity of CQs and therewith aims to improve on the effectiveness of CQs in ontology development, thanks to their respective identifiable distinct constituent elements. We illustrate and evaluate them with a user story and demonstrate where which type can be used in ontology development tasks. To foster use and research, we created an annotated repository of 438 CQs, the Repository of Ontology Competency QuestionS (ROCQS), incorporating an existing CQ dataset and new CQs and CQ templates, which further demonstrate distinctions among types of CQs.
Knowledge Graphs: The Future of Data Integration and Insightful Discovery
Mohamed, Saher, Farah, Kirollos, Lotfy, Abdelrahman, Rizk, Kareem, Saeed, Abdelrahman, Mohamed, Shahenda, Khouriba, Ghada, Arafa, Tamer
Knowledge graphs are an efficient method for representing and connecting information across various concepts, useful in reasoning, question answering, and knowledge base completion tasks. They organize data by linking points, enabling researchers to combine diverse information sources into a single database. This interdisciplinary approach helps uncover new research questions and ideas. Knowledge graphs create a web of data points (nodes) and their connections (edges), which enhances navigation, comprehension, and utilization of data for multiple purposes. They capture complex relationships inherent in unstructured data sources, offering a semantic framework for diverse entities and their attributes. Strategies for developing knowledge graphs include using seed data, named entity recognition, and relationship extraction. These graphs enhance chatbot accuracy and include multimedia data for richer information. Creating high-quality knowledge graphs involves both automated methods and human oversight, essential for accurate and comprehensive data representation.
"They've Stolen My GPL-Licensed Model!": Toward Standardized and Transparent Model Licensing
Duan, Moming, Zhao, Rui, Jiang, Linshan, Shadbolt, Nigel, He, Bingsheng
As model parameter sizes reach the billion-level range and their training consumes zettaFLOPs of computation, components reuse and collaborative development are become increasingly prevalent in the Machine Learning (ML) community. These components, including models, software, and datasets, may originate from various sources and be published under different licenses, which govern the use and distribution of licensed works and their derivatives. However, commonly chosen licenses, such as GPL and Apache, are software-specific and are not clearly defined or bounded in the context of model publishing. Meanwhile, the reused components may also have free-content licenses and model licenses, which pose a potential risk of license noncompliance and rights infringement within the model production workflow. In this paper, we propose addressing the above challenges along two lines: 1) For license analysis, we have developed a new vocabulary for ML workflow management and encoded license rules to enable ontological reasoning for analyzing rights granting and compliance issues. 2) For standardized model publishing, we have drafted a set of model licenses that provide flexible options to meet the diverse needs of model publishing. Our analysis tool is built on Turtle language and Notation3 reasoning engine, envisioned as a first step toward Linked Open Model Production Data. We have also encoded our proposed model licenses into rules and demonstrated the effects of GPL and other commonly used licenses in model publishing, along with the flexibility advantages of our licenses, through comparisons and experiments.
Structured Extraction of Real World Medical Knowledge using LLMs for Summarization and Search
Kim, Edward, Shrestha, Manil, Foty, Richard, DeLay, Tom, Seyfert-Margolis, Vicki
Creation and curation of knowledge graphs can accelerate disease discovery and analysis in real-world data. While disease ontologies aid in biological data annotation, codified categories (SNOMED-CT, ICD10, CPT) may not capture patient condition nuances or rare diseases. Multiple disease definitions across data sources complicate ontology mapping and disease clustering. We propose creating patient knowledge graphs using large language model extraction techniques, allowing data extraction via natural language rather than rigid ontological hierarchies. Our method maps to existing ontologies (MeSH, SNOMED-CT, RxNORM, HPO) to ground extracted entities. Using a large ambulatory care EHR database with 33.6M patients, we demonstrate our method through the patient search for Dravet syndrome, which received ICD10 recognition in October 2020. We describe our construction of patient-specific knowledge graphs and symptom-based patient searches. Using confirmed Dravet syndrome ICD10 codes as ground truth, we employ LLM-based entity extraction to characterize patients in grounded ontologies. We then apply this method to identify Beta-propeller protein-associated neurodegeneration (BPAN) patients, demonstrating real-world discovery where no ground truth exists.
AdaptLIL: A Gaze-Adaptive Visualization for Ontology Mapping
This paper showcases AdaptLIL, a real-time adaptive link-indented list ontology mapping visualization that uses eye gaze as the primary input source. Through a multimodal combination of real-time systems, deep learning, and web development applications, this system uniquely curtails graphical overlays (adaptations) to pairwise mappings of link-indented list ontology visualizations for individual users based solely on their eye gaze.