AITopics | Ontologies

Collaborating Authors

Ontologies

"An ontology defines the terms used to describe and represent an area of knowledge. … Ontologies include computer-usable definitions of basic concepts in the domain and the relationships among them."
– from OWL Web Ontology Language Use Cases and Requirements. W3C Recommendation (10 February 2004). Jeff Heflin, editor.

News Overviews Instructional Materials AI-Alerts Classics

An Ontology-Driven Graph RAG for Legal Norms: A Structural, Temporal, and Deterministic Approach

de Martim, Hudson

arXiv.org Artificial IntelligenceSep-12-2025

Retrieval-Augmented Generation (RAG) systems in the legal domain face a critical challenge: standard, flat-text retrieval is blind to the hierarchical, diachronic, and causal structure of law, leading to anachronistic and unreliable answers. This paper introduces the Structure-Aware Temporal Graph RAG (SAT-Graph RAG), an ontology-driven framework designed to overcome these limitations by explicitly modeling the formal structure and diachronic nature of legal norms. We ground our knowledge graph in a formal, LRMoo-inspired model that distinguishes abstract legal Works from their versioned Expressions. We model temporal states as efficient aggregations that reuse the versioned expressions (CTVs) of unchanged components, and we reify legislative events as first-class Action nodes to make causality explicit and queryable. This structured backbone enables a unified, planner-guided query strategy that applies explicit policies to deterministically resolve complex requests for (i) point-in-time retrieval, (ii) hierarchical impact analysis, and (iii) auditable provenance reconstruction. Through a case study on the Brazilian Constitution, we demonstrate how this approach provides a verifiable, temporally-correct substrate for LLMs, enabling higher-order analytical capabilities while drastically reducing the risk of factual errors. The result is a practical framework for building more trustworthy and explainable legal AI systems.

large language model, machine learning, question answering, (23 more...)

arXiv.org Artificial Intelligence

2505.00039

Country: South America > Brazil (0.49)

Genre: Research Report (0.50)

Industry:

Law > Statutes (0.68)
Information Technology > Security & Privacy (0.68)
Government > Regional Government > South America Government > Brazil Government (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.94)
(2 more...)

Add feedback

KROMA: Ontology Matching with Knowledge Retrieval and Large Language Models

Nguyen, Lam, Barcelos, Erika, French, Roger, Wu, Yinghui

arXiv.org Artificial IntelligenceSep-12-2025

Ontology Matching (OM) is a cornerstone task of semantic interoperability, yet existing systems often rely on handcrafted rules or specialized models with limited adaptability. We present KROMA, a novel OM framework that harnesses Large Language Models (LLMs) within a Retrieval-Augmented Generation (RAG) pipeline, to dynamically enrich the semantic context of OM tasks with structural, lexical, and definitional knowledge. To optimize both performance and efficiency, KROMA integrates a bisimilarity-based concept matching and a lightweight ontology refinement step, which prune candidate concepts and substantially reduce the communication overhead from invoking LLMs. Through experiments on multiple benchmark datasets, we show that integrating knowledge retrieval with context-augmented LLMs significantly enhances ontology matching--outperforming both classic OM systems and cutting-edge LLM-based approaches--while keeping communication overhead comparable. Our study highlights the feasibility and benefit of the proposed optimization techniques (targeted knowledge retrieval, prompt enrichment, and ontology refinement) for ontology matching at scale. Our code and experimental dataset has been made available at: https://github.com/lamng3/kroma

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2507.14032

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Tripartite-GraphRAG via Plugin Ontologies

Banf, Michael, Kuhn, Johannes

arXiv.org Artificial IntelligenceSep-10-2025

Large Language Models (LLMs) have shown remarkable capabilities across various domains, yet they struggle with knowledge-intensive tasks in areas that demand factual accuracy, e.g. industrial automation and healthcare. Key limitations include their tendency to hallucinate, lack of source traceability (provenance), and challenges in timely knowledge updates. Combining language models with knowledge graphs (GraphRAG) offers promising avenues for overcoming these deficits. However, a major challenge lies in creating such a knowledge graph in the first place. Here, we propose a novel approach that combines LLMs with a tripartite knowledge graph representation, which is constructed by connecting complex, domain-specific objects via a curated ontology of corresponding, domain-specific concepts to relevant sections within chunks of text through a concept-anchored pre-analysis of source documents starting from an initial lexical graph. Subsequently, we formulate LLM prompt creation as an unsupervised node classification problem allowing for the optimization of information density, coverage, and arrangement of LLM prompts at significantly reduced lengths. An initial experimental evaluation of our approach on a healthcare use case, involving multi-faceted analyses of patient anamneses given a set of medical concepts as well as a series of clinical guideline literature, indicates its potential to optimize information density, coverage, and arrangement of LLM prompts while significantly reducing their lengths, which, in turn, may lead to reduced costs as well as more consistent and reliable LLM outputs.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2504.19667

Country: Europe > Germany (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Bio-KGvec2go: Serving up-to-date Dynamic Biomedical Knowledge Graph Embeddings

Ahmad, Hamid, Paulheim, Heiko, Sousa, Rita T.

arXiv.org Artificial IntelligenceSep-10-2025

Knowledge graphs and ontologies represent entities and their relationships in a structured way, having gained significance in the development of modern AI applications. Integrating these semantic resources with machine learning models often relies on knowledge graph embedding models to transform graph data into numerical representations. Therefore, pre-trained models for popular knowledge graphs and ontologies are increasingly valuable, as they spare the need to retrain models for different tasks using the same data, thereby helping to democratize AI development and enabling sustainable computing. In this paper, we present Bio-KGvec2go, an extension of the KGvec2go Web API, designed to generate and serve knowledge graph embeddings for widely used biomedical ontologies. Given the dynamic nature of these ontologies, Bio-KGvec2go also supports regular updates aligned with ontology version releases. By offering up-to-date embeddings with minimal computational effort required from users, Bio-KGvec2go facilitates efficient and timely biomedical research.

artificial intelligence, bio-kgvec2go, ontology, (13 more...)

arXiv.org Artificial Intelligence

2509.07905

Country: North America > United States (0.16)

Genre: Research Report (0.83)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)

Add feedback

Automated Hierarchical Graph Construction for Multi-source Electronic Health Records

Wang, Yinjie, Zhou, Doudou, Liu, Yue, Lu, Junwei, Cai, Tianxi

arXiv.org Machine LearningSep-9-2025

Electronic Health Records (EHRs), comprising diverse clinical data such as diagnoses, medications, and laboratory results, hold great promise for translational research. EHR-derived data have advanced disease prevention, improved clinical trial recruitment, and generated real-world evidence. Synthesizing EHRs across institutions enables large-scale, generalizable studies that capture rare diseases and population diversity, but remains hindered by the heterogeneity of medical codes, institution-specific terminologies, and the absence of standardized data structures. These barriers limit the interpretability, comparability, and scalability of EHR-based analyses, underscoring the need for robust methods to harmonize and extract meaningful insights from distributed, heterogeneous data. To address this, we propose MASH (Multi-source Automated Structured Hierarchy), a fully automated framework that aligns medical codes across institutions using neural optimal transport and constructs hierarchical graphs with learned hyperbolic embeddings. During training, MASH integrates information from pre-trained language models, co-occurrence patterns, textual descriptions, and supervised labels to capture semantic and hierarchical relationships among medical concepts more effectively. Applied to real-world EHR data, including diagnosis, medication, and laboratory codes, MASH produces interpretable hierarchical graphs that facilitate the navigation and understanding of heterogeneous clinical data. Notably, it generates the first automated hierarchies for unstructured local laboratory codes, establishing foundational references for downstream applications.

hierarchy, laboratory code, node, (17 more...)

arXiv.org Machine Learning

2509.06576

Country:

North America > United States > New York (0.04)
North America > United States > Indiana > Marion County > Indianapolis (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.87)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.69)
(2 more...)

Add feedback

Neuro-Symbolic AI for Cybersecurity: State of the Art, Challenges, and Opportunities

Hakim, Safayat Bin, Adil, Muhammad, Velasquez, Alvaro, Xu, Shouhuai, Song, Houbing Herbert

arXiv.org Artificial IntelligenceSep-9-2025

Traditional Artificial Intelligence (AI) approaches in cybersecurity exhibit fundamental limitations: inadequate conceptual grounding leading to non-robustness against novel attacks; limited instructibility impeding analyst-guided adaptation; and misalignment with cybersecurity objectives. Neuro-Symbolic (NeSy) AI has emerged with the potential to revolutionize cybersecurity AI. However, there is no systematic understanding of this emerging approach. These hybrid systems address critical cybersecurity challenges by combining neural pattern recognition with symbolic reasoning, enabling enhanced threat understanding while introducing concerning autonomous offensive capabilities that reshape threat landscapes. In this survey, we systematically characterize this field by analyzing 127 publications spanning 2019-July 2025. We introduce a Grounding-Instructibility-Alignment (G-I-A) framework to evaluate these systems, focusing on both cyber defense and cyber offense across network security, malware analysis, and cyber operations. Our analysis shows advantages of multi-agent NeSy architectures and identifies critical implementation challenges including standardization gaps, computational complexity, and human-AI collaboration requirements that constrain deployment. We show that causal reasoning integration is the most transformative advancement, enabling proactive defense beyond correlation-based approaches. Our findings highlight dual-use implications where autonomous systems demonstrate substantial capabilities in zero-day exploitation while achieving significant cost reductions, altering threat dynamics. We provide insights and future research directions, emphasizing the urgent need for community-driven standardization frameworks and responsible development practices that ensure advancement serves defensive cybersecurity objectives while maintaining societal alignment.

large language model, logic & formal reasoning, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2509.06921

Country:

North America > United States > Maryland (0.67)
North America > United States > Colorado (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Communications > Collaboration (1.00)
(10 more...)

Add feedback

Towards Ontology-Based Descriptions of Conversations with Qualitatively-Defined Concepts

Gendron, Barbara, Guibon, Gaël, D'aquin, Mathieu

arXiv.org Artificial IntelligenceSep-8-2025

The controllability of Large Language Models (LLMs) when used as conversational agents is a key challenge, particularly to ensure predictable and user-personalized responses. This work proposes an ontology-based approach to formally define conversational features that are typically qualitative in nature. By leveraging a set of linguistic descriptors, we derive quantitative definitions for qualitatively-defined concepts, enabling their integration into an ontology for reasoning and consistency checking. We apply this framework to the task of proficiency-level control in conversations, using CEFR language proficiency levels as a case study. These definitions are then formalized in description logic and incorporated into an ontology, which guides controlled text generation of an LLM through fine-tuning. Experimental results demonstrate that our approach provides consistent and explainable proficiency-level definitions, improving transparency in conversational AI.

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2509.04926

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Education > Curriculum > Subject-Specific Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

ODKE+: Ontology-Guided Open-Domain Knowledge Extraction with LLMs

Khorshidi, Samira, Nikfarjam, Azadeh, Shankar, Suprita, Sang, Yisi, Govind, Yash, Jang, Hyun, Kasgari, Ali, McClimans, Alexis, Soliman, Mohamed, Konda, Vishnu, Fakhry, Ahmed, Qi, Xiaoguang

arXiv.org Artificial IntelligenceSep-8-2025

Knowledge graphs (KGs) are foundational to many AI applications, but maintaining their freshness and completeness remains costly. We present ODKE+, a production-grade system that automatically extracts and ingests millions of open-domain facts from web sources with high precision. ODKE+ combines modular components into a scalable pipeline: (1) the Extraction Initiator detects missing or stale facts, (2) the Evidence Retriever collects supporting documents, (3) hybrid Knowledge Extractors apply both pattern-based rules and ontology-guided prompting for large language models (LLMs), (4) a lightweight Grounder validates extracted facts using a second LLM, and (5) the Corroborator ranks and normalizes candidate facts for ingestion. ODKE+ dynamically generates ontology snippets tailored to each entity type to align extractions with schema constraints, enabling scalable, type-consistent fact extraction across 195 predicates. The system supports batch and streaming modes, processing over 9 million Wikipedia pages and ingesting 19 million high-confidence facts with 98.8% precision. ODKE+ significantly improves coverage over traditional methods, achieving up to 48% overlap with third-party KGs and reducing update lag by 50 days on average. Our deployment demonstrates that LLM-based extraction, grounded in ontological structure and verification workflows, can deliver trustworthiness, production-scale knowledge ingestion with broad real-world applicability. A recording of the system demonstration is included with the submission and is also available at https://youtu.be/UcnE3_GsTWs.

extraction, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2509.04696

Country: North America > United States (0.47)

Genre: Research Report (0.64)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Ontology-Aligned Embeddings for Data-Driven Labour Market Analytics

Hihn, Heinke, Dittrich, Dennis A. V., Jeske, Carl, Sobral, Cayo Costa, Pais, Helio, Lochmann, Timm

arXiv.org Artificial IntelligenceSep-8-2025

The limited ability to reason across occupational data from different sources is a long-standing bottleneck for data-driven labour market analytics. Previous research has relied on hand-crafted ontologies that allow such reasoning but are computationally expensive and require careful maintenance by human experts. The rise of language processing machine learning models offers a scalable alternative by learning shared semantic spaces that bridge diverse occupational vocabularies without extensive human curation. We present an embedding-based alignment process that links any free-form German job title to two established ontologies - the German Klassifikation der Berufe and the International Standard Classification of Education. Using publicly available data from the German Federal Employment Agency, we construct a dataset to fine-tune a Sentence-BERT model to learn the structure imposed by the ontologies. The enriched pairs (job title, embedding) define a similarity graph structure that we can use for efficient approximate nearest-neighbour search, allowing us to frame the classification process as a semantic search problem. This allows for greater flexibility, e.g., adding more classes. We discuss design decisions, open challenges, and outline ongoing work on extending the graph with other ontologies and multilingual titles.

information retrieval, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.04942

Country: Europe > Germany (1.00)

Genre: Research Report (1.00)

Industry:

Banking & Finance (1.00)
Education > Educational Setting > Higher Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Towards an Action-Centric Ontology for Cooking Procedures Using Temporal Graphs

Kumbhakern, Aarush, Gupta, Saransh Kumar, Dey, Lipika, Das, Partha Pratim

arXiv.org Artificial IntelligenceSep-5-2025

Formalizing cooking procedures remains a challenging task due to their inherent complexity and ambiguity. We introduce an extensible domain-specific language for representing recipes as directed action graphs, capturing processes, transfers, environments, concurrency, and compositional structure. Our approach enables precise, modular modeling of complex culinary workflows. Initial manual evaluation on a full English breakfast recipe demonstrates the DSL's expressiveness and suitability for future automated recipe analysis and execution. This work represents initial steps towards an action-centric ontology for cooking, using temporal graphs to enable structured machine understanding, precise interpretation, and scalable automation of culinary processes - both in home kitchens and professional culinary settings.

artificial intelligence, concurrency, natural language, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3746264.3760499

2509.04159

Country: Europe (0.68)

Genre:

Workflow (0.67)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.62)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback