Ontologies
Data Quality in Ontology-based Data Access: The Case of Consistency
Console, Marco (Sapienza, university of Rome) | Lenzerini, Maurizio (Sapienza, university of Rome)
Ontology-based data access (OBDA) is a new paradigm aiming at accessing and managing data by means of an ontology, i.e., a conceptual representation of the domain of interest in the underlying information system. In the last years, this new paradigm has been used for providing users with abstract (independent from technological and system-oriented aspects), effective, and reasoning-intensive mechanisms for querying the data residing at the information system sources. In this paper we argue that OBDA, besides querying data, provides the right principles for devising a formal approach to data quality. In particular, we concentrate on one of the most important dimensions considered both in the literature and in the practice of data quality, namely consistency. We define a general framework for data consistency in OBDA, and present algorithms and complexity analysis for several relevant tasks related to the problem of checking data quality under this dimension, both at the extensional level (content of the data sources), and at the intensional level (schema of the data sources).
Capturing Relational Schemas and Functional Dependencies in RDFS
Calvanese, Diego (Free University of Bozen-Bolzano) | Fischl, Wolfgang (Vienna University of Technology) | Pichler, Reinhard (Vienna University of Technology) | Sallinger, Emanuel (Vienna University of Technology) | Simkus, Mantas (Vienna University of Technology)
Mapping relational data to RDF is an important task for the development of the Semantic Web. To this end, the W3C has recently released a Recommendation for the so-called direct mapping of relational data to RDF. In this work, we propose an enrichment of the direct mapping to make it more faithful by transferring also semantic information present in the relational schema from the relational world to the RDF world. We thus introduce expressive identification constraints to capture functional dependencies and define an RDF Normal Form, which precisely captures the classical Boyce-Codd Normal Form of relational schemas.
How Long Will It Take? Accurate Prediction of Ontology Reasoning Performance
Kang, Yong-Bin (Monash University) | Pan, Jeff Z. (University of Aberdeen) | Krishnaswamy, Shonali (Institute for Infocomm Research) | Sawangphol, Wudhichart (Monash University) | Li, Yuan-Fang (Monash University)
For expressive ontology languages such as OWL 2 DL, classification is a computationally expensive task—2NEXPTIME-complete in the worst case. Hence, it is highly desirable to be able to accurately estimate classification time, especially for large and complex ontologies. Recently, machine learning techniques have been successfully applied to predicting the reasoning hardness category for a given (ontology, reasoner) pair. In this paper, we further develop predictive models to estimate actual classification time using regression techniques, with ontology metrics as features. Our large-scale experiments on 6 state-of-the-art OWL 2 DL reasoners and more than 450 significantly diverse ontologies demonstrate that the prediction models achieve high accuracy, good generalizability and statistical significance. Such prediction models have a wide range of applications. We demonstrate how they can be used to efficiently and accurately identify performance hotspots in a large and complex ontology, an otherwise very time-consuming and resource-intensive task.
Converting Instance Checking to Subsumption: A Rethink for Object Queries over Practical Ontologies
Xu, Jia (University of Miami) | Visser, Ubbo (University of Miami) | Kabuka, Mansur (University of Miami)
Instance checking is considered a central service for data retrieval from description logic (DL) ontologies. In this paper, we propose a revised most specific concept (MSC) method for DL SHI}, which converts instance checking into subsumption problems. This revised method can generate small concepts that are specific-enough to answer a given query, and allow reasoning to explore only a subset of the ABox data to achieve efficiency. Experiments show effectiveness of our proposed method in terms of concept size reduction and the improvement in reasoning efficiency.
Contraction and Revision over DL-Lite TBoxes
Zhuang, Zhiqiang (Griffith University) | Wang, Zhe (Griffith University) | Wang, Kewen (Griffith University) | Qi, Guilin (Southeast University)
Two essential tasks in managing Description Logic (DL) ontologies are eliminating problematic axioms and incorporating newly formed axioms. Such elimination and incorporation are formalised as the operations of contraction and revision in belief change.In this paper, we deal with contraction and revision for the DL-Lite family through a model-theoretic approach.Standard DL semantics yields infinite numbers of models for DL-Lite TBoxes, thus it is not practical to develop algorithms for contraction and revision that involve DL models. The key to our approach is the introduction of an alternative semantics called type semantics which is more succinct than DL semantics. More importantly, with a finite signature, type semantics always yields finite humber of models.We then define model-based contraction and revision for DL-Lite TBoxesunder type semantics and provide representation theorems for them.Finally, the succinctness of type semantics allows us to develop tractable algorithms for both operations.
Towards Scalable Exploration of Diagnoses in an Ontology Stream
Diagnosis, or the process of identifying the nature and cause of an anomaly in an ontology, has been largely studied by the Semantic Web community. In the context of ontology stream, diagnosis results are not captured by a unique fixed ontology but numerous time-evolving ontologies. Thus any anomaly can be diagnosed by a large number of different explana- tions depending on the version and evolution of the ontology. We address the problems of identifying, representing, exploiting and exploring the evolution of diagnoses representations. Our approach consists in a graph-based representation, which aims at (i) efficiently organizing and linking time-evolving di- agnoses and (ii) being used for scalable exploration. The ex- periments have shown scalable diagnoses exploration in the context of real and live data from Dublin City.
Pay-As-You-Go OWL Query Answering Using a Triple Store
Zhou, Yujiao (University of Oxford) | Nenov, Yavor (University of Oxford) | Grau, Bernardo Cuenca (University of Oxford) | Horrocks, Ian (University of Oxford)
We present an enhanced hybrid approach to OWL query answering that combines an RDF triple-store with an OWL reasoner in order to provide scalable pay-as-you-go performance. The enhancements presented here include an extension to deal with arbitrary OWL ontologies, and optimisations that significantly improve scalability. We have implemented these techniques in a prototype system, a preliminary evaluation of which has produced very encouraging results.
Cross-Lingual Knowledge Validation Based Taxonomy Derivation from Heterogeneous Online Wikis
Wang, Zhigang (Tsinghua University) | Li, Juanzi (Tsinghua University) | Li, Shuangjie (Tsinghua University) | Li, Mingyang (Tsinghua University) | Tang, Jie (Tsinghua University) | Zhang, Kuo (Sogou Inc.) | Zhang, Kun (Sogou Inc.)
Creating knowledge bases based on the crowd-sourced wikis, like Wikipedia, has attracted significant research interest in the field of intelligent Web. However, the derived taxonomies usually contain many mistakenly imported taxonomic relations due to the difference between the user-generated subsumption relations and the semantic taxonomic relations. Current approaches to solving the problem still suffer the following issues: (i) the heuristic-based methods strongly rely on specific language dependent rules. (ii) the corpus-based methods depend on a large-scale high-quality corpus, which is often unavailable. In this paper, we formulate the cross-lingual taxonomy derivation problem as the problem of cross-lingual taxonomic relation prediction. We investigate different linguistic heuristics and language independent features, and propose a cross-lingual knowledge validation based dynamic adaptive boosting model to iteratively reinforce the performance of taxonomic relation prediction. The proposed approach successfully overcome the above issues, and experiments show that our approach significantly outperforms the designed state-of-the-art comparison methods.
A Tractable Approach to ABox Abduction over Description Logic Ontologies
Du, Jianfeng (Guangdong University of Foreign Studies) | Wang, Kewen (Griffith University) | Shen, Yi-Dong (Chinese Academy of Sciences)
ABox abduction is an important reasoning mechanism for description logic ontologies. It computes all minimal explanations (sets of ABox assertions) whose appending to a consistent ontology enforces the entailment of an observation while keeps the ontology consistent. We focus on practical computation for a general problem of ABox abduction, called the query abduction problem, where an observation is a Boolean conjunctive query and the explanations may contain fresh individuals neither in the ontology nor in the observation. However, in this problem there can be infinitely many minimal explanations. Hence we first identify a class of TBoxes called first-order rewritable TBoxes. It guarantees the existence of finitely many minimal explanations and is sufficient for many ontology applications. To reduce the number of explanations that need to be computed, we introduce a special kind of minimal explanations called representative explanations from which all minimal explanations can be retrieved. We develop a tractable method (in data complexity) for computing all representative explanations in a consistent ontology. xperimental results demonstrate that the method is efficient and scalable for ontologies with large ABoxes.
ARIA: Asymmetry Resistant Instance Alignment
Lee, Sanghoon (POSTECH) | Hwang, Seung-won (POSTECH)
We study the problem of instance alignment between knowledge bases (KBs). Existing approaches, exploiting the “symmetry” of structure and information across KBs, suffer in the presence of asymmetry, which is frequent as KBs are independently built. Specifically, we observe three types of asymmetries (in concepts, in features, and in structures). Our goal is to identify key techniques to reduce accuracy loss caused by each type of asymmetry, then design Asymmetry-Resistant Instance Alignment framework (ARIA). ARIA uses two-phased blocking methods considering concept and feature asymmetries, with a novel similarity measure overcoming structure asymmetry. Compared to a state-of-the-art method, ARIA increased precision by 19% and recall by 2%, and decreased processing time by more than 80% in matching large-scale real-life KBs.