Goto

Collaborating Authors

 Ontologies


Cell-ontology guided transcriptome foundation model

arXiv.org Artificial Intelligence

Transcriptome foundation models (TFMs) hold great promises of deciphering the transcriptomic language that dictate diverse cell functions by self-supervised learning on large-scale single-cell gene expression data, and ultimately unraveling the complex mechanisms of human diseases. However, current TFMs treat cells as independent samples and ignore the taxonomic relationships between cell types, which are available in cell ontology graphs. We argue that effectively leveraging this ontology information during the TFM pre-training can improve learning biologically meaningful gene co-expression patterns while preserving TFM as a general purpose foundation model for downstream zero-shot and fine-tuning tasks. To this end, we present single cell, Cell-ontology guided TFM (scCello). We introduce cell-type coherence loss and ontology alignment loss, which are minimized along with the masked gene expression prediction loss during the pre-training. The novel loss component guide scCello to learn the cell-type-specific representation and the structural relation between cell types from the cell ontology graph, respectively. We pre-trained scCello on 22 million cells from CellxGene database leveraging their cell-type labels mapped to the cell ontology graph from Open Biological and Biomedical Ontology Foundry. Our TFM demonstrates competitive generalization and transferability performance over the existing TFMs on biologically important tasks including identifying novel cell types of unseen cells, prediction of cell-type-specific marker genes, and cancer drug responses.


An Open Knowledge Graph-Based Approach for Mapping Concepts and Requirements between the EU AI Act and International Standards

arXiv.org Artificial Intelligence

The many initiatives on trustworthy AI result in a confusing and multipolar landscape that organizations operating within the fluid and complex international value chains must navigate in pursuing trustworthy AI. The EU's AI Act will now shift the focus of such organizations toward conformance with the technical requirements for regulatory compliance, for which the Act relies on Harmonized Standards. Though a high-level mapping to the Act's requirements will be part of such harmonization, determining the degree to which standards conformity delivers regulatory compliance with the AI Act remains a complex challenge. Variance and gaps in the definitions of concepts and how they are used in requirements between the Act and harmonized standards may impact the consistency of compliance claims across organizations, sectors, and applications. This may present regulatory uncertainty, especially for SMEs and public sector bodies relying on standards conformance rather than proprietary equivalents for developing and deploying compliant high-risk AI systems. To address this challenge, this paper offers a simple and repeatable mechanism for mapping the terms and requirements relevant to normative statements in regulations and standards, e.g., AI Act and ISO management system standards, texts into open knowledge graphs. This representation is used to assess the adequacy of standards conformance to regulatory compliance and thereby provide a basis for identifying areas where further technical consensus development in trustworthy AI value chains is required to achieve regulatory compliance.


Towards a Knowledge Graph for Models and Algorithms in Applied Mathematics

arXiv.org Artificial Intelligence

Mathematical models and algorithms are an essential part of mathematical research data, as they are epistemically grounding numerical data. In order to represent models and algorithms as well as their relationship semantically to make this research data FAIR, two previously distinct ontologies were merged and extended, becoming a living knowledge graph. The link between the two ontologies is established by introducing computational tasks, as they occur in modeling, corresponding to algorithmic tasks. Moreover, controlled vocabularies are incorporated and a new class, distinguishing base quantities from specific use case quantities, was introduced. Also, both models and algorithms can now be enriched with metadata. Subject-specific metadata is particularly relevant here, such as the symmetry of a matrix or the linearity of a mathematical model. This is the only way to express specific workflows with concrete models and algorithms, as the feasible solution algorithm can only be determined if the mathematical properties of a model are known. We demonstrate this using two examples from different application areas of applied mathematics. In addition, we have already integrated over 250 research assets from applied mathematics into our knowledge graph.


NFDI4DSO: Towards a BFO Compliant Ontology for Data Science

arXiv.org Artificial Intelligence

The NFDI4DataScience (NFDI4DS) project aims to enhance the accessibility and interoperability of research data within Data Science (DS) and Artificial Intelligence (AI) by connecting digital artifacts and ensuring they adhere to FAIR (Findable, Accessible, Interoperable, and Reusable) principles. To this end, this poster introduces the NFDI4DS Ontology, which describes resources in DS and AI and models the structure of the NFDI4DS consortium. Built upon the NFDICore ontology and mapped to the Basic Formal Ontology (BFO), this ontology serves as the foundation for the NFDI4DS knowledge graph currently under development.


Towards a Cyber Information Ontology

arXiv.org Artificial Intelligence

This paper introduces a set of terms that are intended to act as an interface between cyber ontologies (like a file system ontology or a data fusion ontology) and top- and mid-level ontologies, specifically Basic Formal Ontology and the Common Core Ontologies. These terms center on what makes cyberinformation management unique: numerous acts of copying items of information, the aggregates of copies that result from those acts, and the faithful members of those aggregates that represent all other members.


Planning with OWL-DL Ontologies (Extended Version)

arXiv.org Artificial Intelligence

We introduce ontology-mediated planning, in which planning problems are combined with an ontology. Our formalism differs from existing ones in that we focus on a strong separation of the formalisms for describing planning problems and ontologies, which are only losely coupled by an interface. Moreover, we present a black-box algorithm that supports the full expressive power of OWL DL. This goes beyond what existing approaches combining automated planning with ontologies can do, which only support limited description logics such as DL-Lite and description logics that are Horn. Our main algorithm relies on rewritings of the ontology-mediated planning specifications into PDDL, so that existing planning systems can be used to solve them. The algorithm relies on justifications, which allows for a generic approach that is independent of the expressivity of the ontology language. However, dedicated optimizations for computing justifications need to be implemented to enable an efficient rewriting procedure. We evaluated our implementation on benchmark sets from several domains. The evaluation shows that our procedure works in practice and that tailoring the reasoning procedure has significant impact on the performance.


OWL2Vec4OA: Tailoring Knowledge Graph Embeddings for Ontology Alignment

arXiv.org Artificial Intelligence

Ontology alignment is integral to achieving semantic interoperability as the number of available ontologies covering intersecting domains is increasing. This paper proposes OWL2Vec4OA, an extension of the ontology embedding system OWL2Vec*. While OWL2Vec* has emerged as a powerful technique for ontology embedding, it currently lacks a mechanism to tailor the embedding to the ontology alignment task. OWL2Vec4OA incorporates edge confidence values from seed mappings to guide the random walk strategy. We present the theoretical foundations, implementation details, and experimental evaluation of our proposed extension, demonstrating its potential effectiveness for ontology alignment tasks.


eSPARQL: Representing and Reconciling Agnostic and Atheistic Beliefs in RDF-star Knowledge Graphs

arXiv.org Artificial Intelligence

Over the past few years, we have seen the emergence of large knowledge graphs combining information from multiple sources. Sometimes, this information is provided in the form of assertions about other assertions, defining contexts where assertions are valid. A recent extension to RDF which admits statements over statements, called RDF-star, is in revision to become a W3C standard. However, there is no proposal for a semantics of these RDF-star statements nor a built-in facility to operate over them. In this paper, we propose a query language for epistemic RDF-star metadata based on a four-valued logic, called eSPARQL. Our proposed query language extends SPARQL-star, the query language for RDF-star, with a new type of FROM clause to facilitate operating with multiple and sometimes conflicting beliefs. We show that the proposed query language can express four use case queries, including the following features: (i) querying the belief of an individual, (ii) the aggregating of beliefs, (iii) querying who is conflicting with somebody, and (iv) beliefs about beliefs (i.e., nesting of beliefs).


Dialogue Ontology Relation Extraction via Constrained Chain-of-Thought Decoding

arXiv.org Artificial Intelligence

State-of-the-art task-oriented dialogue systems typically rely on task-specific ontologies for fulfilling user queries. The majority of task-oriented dialogue data, such as customer service recordings, comes without ontology and annotation. Such ontologies are normally built manually, limiting the application of specialised systems. Dialogue ontology construction is an approach for automating that process and typically consists of two steps: term extraction and relation extraction. In this work, we focus on relation extraction in a transfer learning set-up. To improve the generalisation, we propose an extension to the decoding mechanism of large language models. We adapt Chain-of-Thought (CoT) decoding, recently developed for reasoning problems, to generative relation extraction. Here, we generate multiple branches in the decoding space and select the relations based on a confidence threshold. By constraining the decoding to ontology terms and relations, we aim to decrease the risk of hallucination. We conduct extensive experimentation on two widely used datasets and find improvements in performance on target ontology for source fine-tuned and one-shot prompted large language models.


Multi-Level Querying using A Knowledge Pyramid

arXiv.org Artificial Intelligence

This paper addresses the need for improved precision in existing Retrieval-Augmented Generation (RAG) methods that primarily focus on enhancing recall. We propose a multi-layer knowledge pyramid approach within the RAG framework to achieve a better balance between precision and recall. The knowledge pyramid consists of three layers: Ontologies, Knowledge Graphs (KGs), and chunk-based raw text. We employ cross-layer augmentation techniques for comprehensive knowledge coverage and dynamic updates of the Ontology schema and instances. To ensure compactness, we utilize cross-layer filtering methods for knowledge condensation in KGs. Our approach, named PolyRAG, follows a waterfall model for retrieval, starting from the top of the pyramid and progressing down until a confident answer is obtained. We introduce two benchmarks for domain-specific knowledge retrieval, one in the academic domain and the other in the financial domain. The effectiveness of the methods has been validated through comprehensive experiments by outperforming 19 SOTA methods. An encouraging observation is that the proposed method has augmented the GPT-4, providing 395\% F1 gain by improving its performance from 0.1636 to 0.8109.