Goto

Collaborating Authors

 Ontologies


Contextual Semantic Embeddings for Ontology Subsumption Prediction

arXiv.org Artificial Intelligence

Automating ontology construction and curation is an important but challenging task in knowledge engineering and artificial intelligence. Prediction by machine learning techniques such as contextual semantic embedding is a promising direction, but the relevant research is still preliminary especially for expressive ontologies in Web Ontology Language (OWL). In this paper, we present a new subsumption prediction method named BERTSubs for classes of OWL ontology. It exploits the pre-trained language model BERT to compute contextual embeddings of a class, where customized templates are proposed to incorporate the class context (e.g., neighbouring classes) and the logical existential restriction. BERTSubs is able to predict multiple kinds of subsumers including named classes from the same ontology or another ontology, and existential restrictions from the same ontology. Extensive evaluation on five real-world ontologies for three different subsumption tasks has shown the effectiveness of the templates and that BERTSubs can dramatically outperform the baselines that use (literal-aware) knowledge graph embeddings, non-contextual word embeddings and the state-of-the-art OWL ontology embeddings.


OntoMath${}^{\mathbf{PRO}}$ 2.0 Ontology: Updates of the Formal Model

arXiv.org Artificial Intelligence

This paper is devoted to the problems of ontology-based mathematical knowledge management and representation. The main attention is paid to the development of a formal model for the representation of mathematical statements in the Open Linked Data cloud. The proposed model is intended for applications that extract mathematical facts from natural language mathematical texts and represent these facts as Linked Open Data. The model is used in development of a new version of the OntoMath${}^{\mathrm{PRO}}$ ontology of professional mathematics is described. OntoMath${}^{\mathrm{PRO}}$ underlies a semantic publishing platform, that takes as an input a collection of mathematical papers in LaTeX format and builds their ontology-based Linked Open Data representation. The semantic publishing platform, in turn, is a central component of OntoMath digital ecosystem, an ecosystem of ontologies, text analytics tools, and applications for mathematical knowledge management, including semantic search for mathematical formulas and a recommender system for mathematical papers. According to the new model, the ontology is organized into three layers: a foundational ontology layer, a domain ontology layer and a linguistic layer. The domain ontology layer contains language-independent math concepts. The linguistic layer provides linguistic grounding for these concepts, and the foundation ontology layer provides them with meta-ontological annotations. The concepts are organized in two main hierarchies: the hierarchy of objects and the hierarchy of reified relationships.


SUAVE: An Exemplar for Self-Adaptive Underwater Vehicles

arXiv.org Artificial Intelligence

Once deployed in the real world, autonomous underwater vehicles (AUVs) are out of reach for human supervision yet need to take decisions to adapt to unstable and unpredictable environments. To facilitate research on self-adaptive AUVs, this paper presents SUAVE, an exemplar for two-layered system-level adaptation of AUVs, which clearly separates the application and self-adaptation concerns. The exemplar focuses on a mission for underwater pipeline inspection by a single AUV, implemented as a ROS2-based system. This mission must be completed while simultaneously accounting for uncertainties such as thruster failures and unfavorable environmental conditions. The paper discusses how SUAVE can be used with different self-adaptation frameworks, illustrated by an experiment using the Metacontrol framework to compare AUV behavior with and without self-adaptation. The experiment shows that the use of Metacontrol to adapt the AUV during its mission improves its performance when measured by the overall time taken to complete the mission or the length of the inspected pipeline.


The Life Cycle of Knowledge in Big Language Models: A Survey

arXiv.org Artificial Intelligence

Knowledge plays a critical role in artificial intelligence. Recently, the extensive success of pre-trained language models (PLMs) has raised significant attention about how knowledge can be acquired, maintained, updated and used by language models. Despite the enormous amount of related studies, there still lacks a unified view of how knowledge circulates within language models throughout the learning, tuning, and application processes, which may prevent us from further understanding the connections between current progress or realizing existing limitations. In this survey, we revisit PLMs as knowledge-based systems by dividing the life circle of knowledge in PLMs into five critical periods, and investigating how knowledge circulates when it is built, maintained and used. To this end, we systematically review existing studies of each period of the knowledge life cycle, summarize the main challenges and current limitations, and discuss future directions.


Deciding FO-rewritability of Regular Languages and Ontology-Mediated Queries in Linear Temporal Logic

Journal of Artificial Intelligence Research

Our concern is the problem of determining the data complexity of answering an ontology-mediated query (OMQ) formulated in linear temporal logic LTL over (Z,<) and deciding whether it is rewritable to an FO(<)-query, possibly with some extra predicates. First, we observe that, in line with the circuit complexity and FO-definability of regular languages, OMQ answering in AC0, ACC0ย and NC1ย coincides with FO(<,โ‰ก)-rewritability using unary predicatesย xย โ‰ก 0 (modย n), FO(<,MOD)-rewritability, and FO(RPR)-rewritability using relational primitive recursion, respectively. We prove that, similarly to known PSแด˜แด€แด„แด‡-completeness of recognising FO(<)-definability of regular languages, deciding FO(<,โ‰ก)- and FO(<,MOD)-definability is also PSแด˜แด€แด„แด‡-complete (unless ACC0ย = NC1). We then use this result to show that deciding FO(<)-, FO(<,โ‰ก)- and FO(<,MOD)-rewritability of LTL OMQs is Exแด˜Sแด˜แด€แด„แด‡-complete, and that these problems become PSแด˜แด€แด„แด‡-complete for OMQs with a linear Horn ontology and an atomic query, and also a positive query in the cases of FO(<)- and FO(<,โ‰ก)-rewritability. Further, we consider FO(<)-rewritability of OMQs with a binary-clause ontology and identify OMQ classes, for which deciding it is PSแด˜แด€แด„แด‡-, ฮ 2p- and coNP-complete.


Linked Data Science Powered by Knowledge Graphs

arXiv.org Artificial Intelligence

In recent years, we have witnessed a growing interest in data science not only from academia but particularly from companies investing in data science platforms to analyze large amounts of data. In this process, a myriad of data science artifacts, such as datasets and pipeline scripts, are created. Yet, there has so far been no systematic attempt to holistically exploit the collected knowledge and experiences that are implicitly contained in the specification of these pipelines, e.g., compatible datasets, cleansing steps, ML algorithms, parameters, etc. Instead, data scientists still spend a considerable amount of their time trying to recover relevant information and experiences from colleagues, trial and error, lengthy exploration, etc. In this paper, we, therefore, propose a scalable system (KGLiDS) that employs machine learning to extract the semantics of data science pipelines and captures them in a knowledge graph, which can then be exploited to assist data scientists in various ways. This abstraction is the key to enabling Linked Data Science since it allows us to share the essence of pipelines between platforms, companies, and institutions without revealing critical internal information and instead focusing on the semantics of what is being processed and how. Our comprehensive evaluation uses thousands of datasets and more than thirteen thousand pipeline scripts extracted from data discovery benchmarks and the Kaggle portal and shows that KGLiDS significantly outperforms state-of-the-art systems on related tasks, such as dataset recommendation and pipeline classification.


kogito: A Commonsense Knowledge Inference Toolkit

arXiv.org Artificial Intelligence

In this paper, we present kogito, an open-source tool for generating commonsense inferences about situations described in text. kogito provides an intuitive and extensible interface to interact with natural language generation models that can be used for hypothesizing commonsense knowledge inference from a textual input. In particular, kogito offers several features for targeted, multi-granularity knowledge generation. These include a standardized API for training and evaluating knowledge models, and generating and filtering inferences from them. We also include helper functions for converting natural language texts into a format ingestible by knowledge models - intermediate pipeline stages such as knowledge head extraction from text, heuristic and model-based knowledge head-relation matching, and an ability to define and use custom knowledge relations. We make the code for kogito available at https://github.com/epfl-nlp/kogito along with thorough documentation at https://kogito.readthedocs.io.


Knowledge-augmented Graph Machine Learning for Drug Discovery: A Survey from Precision to Interpretability

arXiv.org Artificial Intelligence

The integration of Artificial Intelligence (AI) into the field of drug discovery has been a growing area of interdisciplinary scientific research. However, conventional AI models are heavily limited in handling complex biomedical structures (such as 2D or 3D protein and molecule structures) and providing interpretations for outputs, which hinders their practical application. As of late, Graph Machine Learning (GML) has gained considerable attention for its exceptional ability to model graph-structured biomedical data and investigate their properties and functional relationships. Despite extensive efforts, GML methods still suffer from several deficiencies, such as the limited ability to handle supervision sparsity and provide interpretability in learning and inference processes, and their ineffectiveness in utilising relevant domain knowledge. In response, recent studies have proposed integrating external biomedical knowledge into the GML pipeline to realise more precise and interpretable drug discovery with limited training instances. However, a systematic definition for this burgeoning research direction is yet to be established. This survey presents a comprehensive overview of long-standing drug discovery principles, provides the foundational concepts and cutting-edge techniques for graph-structured data and knowledge databases, and formally summarises Knowledge-augmented Graph Machine Learning (KaGML) for drug discovery. we propose a thorough review of related KaGML works, collected following a carefully designed search methodology, and organise them into four categories following a novel-defined taxonomy. To facilitate research in this promptly emerging field, we also share collected practical resources that are valuable for intelligent drug discovery and provide an in-depth discussion of the potential avenues for future advancements.


Implementation of a noisy hyperlink removal system: A semantic and relatedness approach

arXiv.org Artificial Intelligence

As the volume of data on the web grows, the web structure graph, which is a graph representation of the web, continues to evolve. The structure of this graph has gradually shifted from content-based to non-content-based. Furthermore, spam data, such as noisy hyperlinks, in the web structure graph adversely affect the speed and efficiency of information retrieval and link mining algorithms. Previous works in this area have focused on removing noisy hyperlinks using structural and string approaches. However, these approaches may incorrectly remove useful links or be unable to detect noisy hyperlinks in certain circumstances. In this paper, a data collection of hyperlinks is initially constructed using an interactive crawler. The semantic and relatedness structure of the hyperlinks is then studied through semantic web approaches and tools such as the DBpedia ontology. Finally, the removal process of noisy hyperlinks is carried out using a reasoner on the DBpedia ontology. Our experiments demonstrate the accuracy and ability of semantic web technologies to remove noisy hyperlinks


Towards a GML-Enabled Knowledge Graph Platform

arXiv.org Artificial Intelligence

This vision paper proposes KGNet, an on-demand graph machine learning (GML) as a service on top of RDF engines to support GML-enabled SPARQL queries. KGNet automates the training of GML models on a KG by identifying a task-specific subgraph. This helps reduce the task-irrelevant KG structure and properties for better scalability and accuracy. While training a GML model on KG, KGNet collects metadata of trained models in the form of an RDF graph called KGMeta, which is interlinked with the relevant subgraphs in KG. Finally, all trained models are accessible via a SPARQL-like query. We call it a GML-enabled query and refer to it as SPARQLML. KGNet supports SPARQLML on top of existing RDF engines as an interface for querying and inferencing over KGs using GML models. The development of KGNet poses research opportunities in several areas, including meta-sampling for identifying task-specific subgraphs, GML pipeline automation with computational constraints, such as limited time and memory budget, and SPARQLML query optimization. KGNet supports different GML tasks, such as node classification, link prediction, and semantic entity matching. We evaluated KGNet using two real KGs of different application domains. Compared to training on the entire KG, KGNet significantly reduced training time and memory usage while maintaining comparable or improved accuracy. The KGNet source-code is available for further study