Ontologies
Box Embeddings for the Description Logic EL++
Xiong, Bo, Potyka, Nico, Tran, Trung-Kien, Nayyeri, Mojtaba, Staab, Steffen
Recently, various methods for representation learning on Knowledge Bases (KBs) have been developed. However, these approaches either only focus on learning the embeddings of the data-level knowledge (ABox) or exhibit inherent limitations when dealing with the concept-level knowledge (TBox), e.g., not properly modelling the structure of the logical knowledge. We present BoxEL, a geometric KB embedding approach that allows for better capturing logical structure expressed in the theories of Description Logic EL++. BoxEL models concepts in a KB as axis-parallel boxes exhibiting the advantage of intersectional closure, entities as points inside boxes, and relations between concepts/entities as affine transformations. We show theoretical guarantees (soundness) of BoxEL for preserving logical structure. Namely, the trained model of BoxEL embedding with loss 0 is a (logical) model of the KB. Experimental results on subsumption reasoning and a real-world application--protein-protein prediction show that BoxEL outperforms traditional knowledge graph embedding methods as well as state-of-the-art EL++ embedding approaches.
Scaling Up Knowledge Graph Creation to Large and Heterogeneous Data Sources
Iglesias, Enrique, Jozashoori, Samaneh, Vidal, Maria-Esther
RDF knowledge graphs (KG) are powerful data structures to represent factual statements created from heterogeneous data sources. KG creation is laborious, and demands data management techniques to be executed efficiently. This paper tackles the problem of the automatic generation of KG creation processes declaratively specified; it proposes techniques for planning and transforming heterogeneous data into RDF triples following mapping assertions specified in the RDF Mapping Language (RML). Given a set of mapping assertions, the planner provides an optimized execution plan by partitioning and scheduling the execution of the assertions. First, the planner assesses an optimized number of partitions considering the number of data sources, type of mapping assertions, and the associations between different assertions. After providing a list of partitions and assertions that belong to each partition, the planner determines their execution order. A greedy algorithm is implemented to generate the partitions' bushy tree execution plan. Bushy tree plans are translated into operating system commands that guide the execution of the partitions of the mapping assertions in the order indicated by the bushy tree. The proposed optimization approach is evaluated over state-of-the-art RML-compliant engines and existing benchmarks of data sources and RML triples maps. Our experimental results suggest that the performance of the studied engines can be considerably improved, particularly in a complex setting with numerous triples maps and data sources. As a result, engines that usually time in complex cases out can, if not entirely execute all the assertions, still produce a portion of the KG.
Ontologies and Semantic Annotation. Part 1: What Is an Ontology - DataScienceCentral.com
In the abundance of information, both machines and human researchers need tools to navigate and process it. Structuring and formalization of data into hierarchies, such as trees, may establish the relations between the data required for efficient machine processing and may make the information more readable for data analysts. Yet, in more complex domains, such as in natural language processing, relations between concepts go beyond simple hierarchies and form thesaurus-like networks. For such cases, researchers use ontologies as common vocabularies for specialists who need to share information in a domain. Ontologies were first defined as "explicit formal specifications of the terms in the domain and relations among them" (Gruber 1993) and, more specifically, "a formal, explicit specification of a shared conceptualization" (Studer et al. 1998) and are used in a number of applications, including the following, as specified by Noy and McGuinness (Noy and McGuinness 2001): Ontologies are the tools to provide comprehensive description of the domain of interest with respect to the users' needs It is something that we see when, for example, medical information is published on, several different websites.
A Knowledge-driven Business Process Analysis Canvas
Business process (BP) analysis represents a first key phase of information system development. It consists in the gathering of domain knowledge and its organization to be later used in the software development, and beyond (e.g., for Business Process Reengineering). The quality of the developed information system largely depends on how the BP analysis has been carried out and the quality of the produced requirement specification documents. Despite the fact that the issue is on the table for decades, business process analysis is still a critical phase of information systems development. One promising strategy is an early and more important involvement of business experts in the BP analysis. This paper presents a methodology that aims at an early involvement of business experts while providing a formal grounding that guarantees the quality of the produced specifications. To this end, we propose the Business Process Analysis Canvas, a knowledge framework organized in eight knowledge sections aimed at supporting the business expert in carrying out the analysis, eventually yielding a BP analysis Ontology.
An Automatic Ontology Generation Framework with An Organizational Perspective
Elnagar, Samaa, Yoon, Victoria, Thomas, Manoj A.
Ontologies have been known for their semantic representation of knowledge. ontologies cannot automatically evolve to reflect updates that occur in respective domains. To address this limitation, researchers have called for automatic ontology generation from unstructured text corpus. Unfortunately, systems that aim to generate ontologies from unstructured text corpus are domain-specific and require manual intervention. In addition, they suffer from uncertainty in creating concept linkages and difficulty in finding axioms for the same concept. Knowledge Graphs (KGs) has emerged as a powerful model for the dynamic representation of knowledge. However, KGs have many quality limitations and need extensive refinement. This research aims to develop a novel domain-independent automatic ontology generation framework that converts unstructured text corpus into domain consistent ontological form. The framework generates KGs from unstructured text corpus as well as refine and correct them to be consistent with domain ontologies. The power of the proposed automatically generated ontology is that it integrates the dynamic features of KGs and the quality features of ontologies.
Specifying and Reasoning about CPS through the Lens of the NIST CPS Framework
Nguyen, Thanh Hai, Bundas, Matthew, Son, Tran Cao, Balduccini, Marcello, Garwood, Kathleen Campbell, Griffor, Edward R.
This paper introduces a formal definition of a Cyber-Physical System (CPS) in the spirit of the CPS Framework proposed by the National Institute of Standards and Technology (NIST). It shows that using this definition, various problems related to concerns in a CPS can be precisely formalized and implemented using Answer Set Programming (ASP). These include problems related to the dependency or conflicts between concerns, how to mitigate an issue, and what the most suitable mitigation strategy for a given issue would be. It then shows how ASP can be used to develop an implementation that addresses the aforementioned problems. The paper concludes with a discussion of the potentials of the proposed methodologies.
Fantastic Data and How to Query Them
Tran, Trung-Kien, Le-Tuan, Anh, Nguyen-Duc, Manh, Yuan, Jicheng, Le-Phuoc, Danh
It is commonly acknowledged that the availability of the huge amount of (training) data is one of the most important factors for many recent advances in Artificial Intelligence (AI). However, datasets are often designed for specific tasks in narrow AI sub areas and there is no unified way to manage and access them. This not only creates unnecessary overheads when training or deploying Machine Learning models but also limits the understanding of the data, which is very important for data-centric AI. In this paper, we present our vision about a unified framework for different datasets so that they can be integrated and queried easily, e.g., using standard query languages. We demonstrate this in our ongoing work to create a framework for datasets in Computer Vision and show its advantages in different scenarios.
Transforming UNL graphs in OWL representations
Rouquet, David, Bellynck, Valérie, Boitet, Christian, Berment, Vincent
Extracting formal knowledge (ontologies) from natural language is a challenge that can benefit from a (semi-) formal linguistic representation of texts, at the semantic level. We propose to achieve such a representation by implementing the Universal Networking Language (UNL) specifications on top of RDF. Thus, the meaning of a statement in any language will be soundly expressed as a RDF-UNL graph that constitutes a middle ground between natural language and formal knowledge. In particular, we show that RDF-UNL graphs can support content extraction using generic SHACL rules and that reasoning on the extracted facts allows detecting incoherence in the original texts. This approach is experimented in the UNseL project that aims at extracting ontological representations from system requirements/specifications in order to check that they are consistent, complete and unambiguous. Our RDF-UNL implementation and all code for the working examples of this paper are publicly available under the CeCILL-B license at https://gitlab.tetras-libre.fr/unl/rdf-unl
DPCL: a Language Template for Normative Specifications
Sileno, Giovanni, van Binsbergen, Thomas, Pascucci, Matteo, van Engers, Tom
Several solutions for specifying normative artefacts (norms, contracts, policies) in a computational processable way have been presented in the literature. Legal core ontologies have been proposed to systematize concepts and relationships relevant to normative reasoning. However, no solution amongst those has achieved general acceptance, and no common ground (representational, computational) has been identified enabling us to easily compare them. Yet, all these efforts share the same motivation of representing normative directives, therefore it is plausible that there may be a representational model encompassing all of them. This presentation will introduce DPCL, a domain-specific language (DSL) for specifying higher-level policies (including norms, contracts, etc.), centred on Hohfeld's framework of fundamental legal concepts. DPCL has to be seen primarily as a "template", i.e. as an informational model for architectural reference, rather than a fully-fledged formal language; it aims to make explicit the general requirements that should be expected in a language for norm specification. In this respect, it goes rather in the direction of legal core ontologies, but differently from those, our proposal aims to keep the character of a DSL, rather than a set of axioms in a logical framework: it is meant to be cross-compiled to underlying languages/tools adequate to the type of target application. We provide here an overview of some of the language features.
Acquisition and Representation of User Preferences Guided by an Ontology
Dandan, Rahma, Despres, Sylvie, Sedki, Karima
Our food preferences guide our food choices and in turn affect our personal health and our social life. In this paper, we adopt an approach using a domain ontology expressed in OWL2 to support the acquisition and representation of preferences in formalism CP-Net. Specifically, we present the construction of the domain ontology and questionnaire design to acquire and represent the preferences. The acquisition and representation of preferences are implemented in the field of university canteen. Our main contribution in this preliminary work is to acquire preferences and enrich the model preferably with domain knowledge represented in the ontology.