AITopics | Ontologies

Collaborating Authors

Ontologies

"An ontology defines the terms used to describe and represent an area of knowledge. … Ontologies include computer-usable definitions of basic concepts in the domain and the relationships among them."
– from OWL Web Ontology Language Use Cases and Requirements. W3C Recommendation (10 February 2004). Jeff Heflin, editor.

News Overviews Instructional Materials AI-Alerts Classics

Meronymic Ontology Extraction via Large Language Models

Zhang, Dekai, Conia, Simone, Rago, Antonio

arXiv.org Artificial IntelligenceNov-11-2025

Ontologies have become essential in today's digital age as a way of organising the vast amount of readily available unstructured text. In providing formal structure to this information, ontologies have immense value and application across various domains, e.g., e-commerce, where countless product listings necessitate proper product organisation. However, the manual construction of these ontologies is a time-consuming, expensive and laborious process. In this paper, we harness the recent advancements in large language models (LLMs) to develop a fully-automated method of extracting product ontologies, in the form of meronymies, from raw review texts. We demonstrate that the ontologies produced by our method surpass an existing, BERT-based baseline when evaluating using an LLM-as-a-judge. Our investigation provides the groundwork for LLMs to be used more generally in (product or otherwise) ontology extraction.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2510.13839

Country: Europe (0.28)

Genre: Research Report (0.82)

Industry: Information Technology (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Kastor: Fine-tuned Small Language Models for Shape-based Active Relation Extraction

Celian, Ringwald, Fabien, Gandon, Catherine, Faron, Franck, Michel, Hanna, Abi Akl

arXiv.org Artificial IntelligenceNov-6-2025

RDF pattern-based extraction is a compelling approach for fine-tuning small language models (SLMs) by focusing a relation extraction task on a specified SHACL shape. This technique enables the development of efficient models trained on limited text and RDF data. In this article, we introduce Kastor, a framework that advances this approach to meet the demands for completing and refining knowledge bases in specialized domains. Kastor reformulates the traditional validation task, shifting from single SHACL shape validation to evaluating all possible combinations of properties derived from the shape. By selecting the optimal combination for each training example, the framework significantly enhances model generalization and performance. Additionally, Kastor employs an iterative learning process to refine noisy knowledge bases, enabling the creation of robust models capable of uncovering new, relevant facts.

computational linguistic, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-94575-5_6

2511.03466

Country:

Europe (1.00)
North America > Mexico (0.28)

Genre: Research Report (0.65)

Industry: Leisure & Entertainment > Sports (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.54)
(2 more...)

Add feedback

Constraint-Driven Small Language Models Based on Agent and OpenAlex Knowledge Graph: Mining Conceptual Pathways and Discovering Innovation Points in Academic Papers

Xia, Ziye, Ospichev, Sergei S.

arXiv.org Artificial IntelligenceNov-6-2025

In recent years, the rapid increase in academic publications across various fields has posed severe challenges for academic paper analysis: scientists struggle to timely and comprehensively track the latest research findings and methodologies. Key concept extraction has proven to be an effective analytical paradigm, and its automation has been achieved with the widespread application of language models in industrial and scientific domains. However, existing paper databases are mostly limited to similarity matching and basic classification of key concepts, failing to deeply explore the relational networks between concepts. This paper is based on the OpenAlex opensource knowledge graph. By analyzing nearly 8,000 open-source paper data from Novosibirsk State University, we discovered a strong correlation between the distribution patterns of paper key concept paths and both innovation points and rare paths. We propose a prompt engineering-based key concept path analysis method. This method leverages small language models to achieve precise key concept extraction and innovation point identification, and constructs an agent based on a knowledge graph constraint mechanism to enhance analysis accuracy. Through fine-tuning of the Qwen and DeepSeek models, we achieved significant improvements in accuracy, with the models publicly available on the Hugging Face platform.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.14303

Country: Asia > Russia > Siberian Federal District > Novosibirsk Oblast > Novosibirsk (0.25)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

InteracSPARQL: An Interactive System for SPARQL Query Refinement Using Natural Language Explanations

Jian, Xiangru, Dong, Zhengyuan, Özsu, M. Tamer

arXiv.org Artificial IntelligenceNov-5-2025

In recent years, querying semantic web data using SPARQL has remained challenging, especially for non-expert users, due to the language's complex syntax and the prerequisite of understanding intricate data structures. To address these challenges, we propose InteracSPARQL, an interactive SPARQL query generation and refinement system that leverages natural language explanations (NLEs) to enhance user comprehension and facilitate iterative query refinement. InteracSPARQL integrates LLMs with a rule-based approach to first produce structured explanations directly from SPARQL abstract syntax trees (ASTs), followed by LLM-based linguistic refinements. Users can interactively refine queries through direct feedback or LLM-driven self-refinement, enabling the correction of ambiguous or incorrect query components in real time. We evaluate InteracSPARQL on standard benchmarks, demonstrating significant improvements in query accuracy, explanation clarity, and overall user satisfaction compared to baseline approaches. Our experiments further highlight the effectiveness of combining rule-based methods with LLM-driven refinements to create more accessible and robust SPARQL interfaces.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2511.02002

Genre: Research Report (1.00)

Industry: Media (0.46)

Technology:

Information Technology > Communications > Web > Semantic Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
(2 more...)

Add feedback

S2Doc -- Spatial-Semantic Document Format

Kempf, Sebastian, Puppe, Frank

arXiv.org Artificial IntelligenceNov-4-2025

Documents are a common way to store and share information, with tables being an important part of many documents. However, there is no real common understanding of how to model documents and tables in particular. Because of this lack of standardization, most scientific approaches have their own way of modeling documents and tables, leading to a variety of different data structures and formats that are not directly compatible. Furthermore, most data models focus on either the spatial or the semantic structure of a document, neglecting the other aspect. To address this, we developed S2Doc, a flexible data structure for modeling documents and tables that combines both spatial and semantic information in a single format. It is designed to be easily extendable to new tasks and supports most modeling approaches for documents and tables, including multi-page documents. To the best of our knowledge, it is the first approach of its kind to combine all these aspects in a single format.

artificial intelligence, information, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.01113

Genre:

Workflow (0.47)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.70)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)

Add feedback

Capturing Legal Reasoning Paths from Facts to Law in Court Judgments using Knowledge Graphs

Kondo, Ryoma, Matsuoka, Riona, Yoshida, Takahiro, Yamasawa, Kazuyuki, Hisano, Ryohei

arXiv.org Artificial IntelligenceNov-4-2025

Court judgments reveal how legal rules have been interpreted and applied to facts, providing a foundation for understanding structured legal reasoning. However, existing automated approaches for capturing legal reasoning, including large language models, often fail to identify the relevant legal context, do not accurately trace how facts relate to legal norms, and may misrepresent the layered structure of judicial reasoning. These limitations hinder the ability to capture how courts apply the law to facts in practice. In this paper, we address these challenges by constructing a legal knowledge graph from 648 Japanese administrative court decisions. Our method extracts components of legal reasoning using prompt-based large language models, normalizes references to legal provisions, and links facts, norms, and legal applications through an ontology of legal inference. The resulting graph captures the full structure of legal reasoning as it appears in real court decisions, making implicit reasoning explicit and machine-readable. We evaluate our system using expert annotated data, and find that it achieves more accurate retrieval of relevant legal provisions from facts than large language model baselines and retrieval-augmented methods.

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2508.1734

Country:

North America (0.93)
Asia > Japan > Honshū (0.15)

Genre: Research Report > New Finding (0.68)

Industry: Law > Litigation (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.69)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.69)

Add feedback

Evontree: Ontology Rule-Guided Self-Evolution of Large Language Models

Tu, Mingchen, Liu, Zhiqiang, Li, Juan, Liu, Liangyurui, Wang, Junjie, Liang, Lei, Zhang, Wen

arXiv.org Artificial IntelligenceOct-31-2025

Large language models (LLMs) have demonstrated exceptional capabilities across multiple domains by leveraging massive pre-training and curated fine-tuning data. However, in data-sensitive fields such as healthcare, the lack of high-quality, domain-specific training corpus hinders LLMs' adaptation for specialized applications. Meanwhile, domain experts have distilled domain wisdom into ontology rules, which formalize relationships among concepts and ensure the integrity of knowledge management repositories. Viewing LLMs as implicit repositories of human knowledge, we propose Evontree--a novel framework that leverages a small set of high-quality ontology rules to systematically extract, validate, and enhance domain knowledge within LLMs, without requiring extensive external datasets. Specifically, Evontree extracts domain ontology from raw models, detects inconsistencies using two core ontology rules, and reinforces the refined knowledge via self-distilled fine-tuning. Extensive experiments on medical QA benchmarks with Llama3-8B-Instruct and Med42-v2 demonstrate consistent outperformance over both unmodified models and leading supervised baselines, achieving up to a 3.7% improvement in accuracy. These results confirm the effectiveness, efficiency, and robustness of our approach for low-resource domain adaptation of LLMs.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.26683

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It

Qin, Yulu, Varghese, Dheeraj, Lindström, Adam Dahlgren, Donatelli, Lucia, Misra, Kanishka, Kim, Najoung

arXiv.org Artificial IntelligenceOct-31-2025

Does vision-and-language (VL) training change the linguistic representations of language models in meaningful ways? Most results in the literature have shown inconsistent or marginal differences, both behaviorally and representationally. In this work, we start from the hypothesis that the domain in which VL training could have a significant effect is lexical-conceptual knowledge, in particular its taxonomic organization. Through comparing minimal pairs of text-only LMs and their VL-trained counterparts, we first show that the VL models often outperform their text-only counterparts on a text-only question-answering task that requires taxonomic understanding of concepts mentioned in the questions. Using an array of targeted behavioral and representational analyses, we show that the LMs and VLMs do not differ significantly in terms of their taxonomic knowledge itself, but they differ in how they represent questions that contain concepts in a taxonomic relation vs. a non-taxonomic relation. This implies that the taxonomic knowledge itself does not change substantially through additional VL training, but VL training does improve the deployment of this knowledge in the context of a specific task, even when the presentation of the task is purely linguistic.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2507.13328

Country:

Europe (1.00)
North America > United States (0.92)
Asia (0.68)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)
(2 more...)

Add feedback

OntoPret: An Ontology for the Interpretation of Human Behavior

Ellis, Alexis, Severyn, Stacie, Novakazi, Fjollë, Banaee, Hadi, Shimizu, Cogan

arXiv.org Artificial IntelligenceOct-28-2025

As human machine teaming becomes central to paradigms like Industry 5.0, a critical need arises for machines to safely and effectively interpret complex human behaviors. A research gap currently exists between techno centric robotic frameworks, which often lack nuanced models of human behavior, and descriptive behavioral ontologies, which are not designed for real time, collaborative interpretation. This paper addresses this gap by presenting OntoPret, an ontology for the interpretation of human behavior. Grounded in cognitive science and a modular engineering methodology, OntoPret provides a formal, machine processable framework for classifying behaviors, including task deviations and deceptive actions. We demonstrate its adaptability across two distinct use cases manufacturing and gameplay and establish the semantic foundations necessary for advanced reasoning about human intentions.

artificial intelligence, interpretation, ontopret, (14 more...)

arXiv.org Artificial Intelligence

2510.23553

Country: Europe > Austria (0.28)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

Entity-Augmented Neuroscience Knowledge Retrieval Using Ontology and Semantic Understanding Capability of LLM

Ta, Pralaypati, Venkatesaperumal, Sriram, Ram, Keerthi, Sivaprakasam, Mohanasankar

arXiv.org Artificial IntelligenceOct-28-2025

Neuroscience research publications encompass a vast wealth of knowledge. Accurately retrieving existing information and discovering new insights from this extensive literature is essential for advancing the field. However, when knowledge is dispersed across multiple sources, current state-of-the-art retrieval methods often struggle to extract the necessary information. A knowledge graph (KG) can integrate and link knowledge from multiple sources. However, existing methods for constructing KGs in neuroscience often rely on labeled data and require domain expertise. Acquiring large-scale, labeled data for a specialized area like neuroscience presents significant challenges. This work proposes novel methods for constructing KG from unlabeled large-scale neuroscience research corpus utilizing large language models (LLM), neuroscience ontology, and text embeddings. We analyze the semantic relevance of neuroscience text segments identified by LLM for building the knowledge graph. We also introduce an entity-augmented information retrieval algorithm to extract knowledge from the KG. Several experiments were conducted to evaluate the proposed approaches. The results demonstrate that our methods significantly enhance knowledge discovery from the unlabeled neuroscience research corpus. The performance of the proposed entity and relation extraction method is comparable to the existing supervised method. It achieves an F1 score of 0.84 for entity extraction from the unlabeled data. The knowledge obtained from the KG improves answers to over 52% of neuroscience questions from the PubMedQA dataset and questions generated using selected neuroscience entities.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.03145

Country: Asia > Middle East > UAE (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback