Large-scale probabilistic knowledge bases are becoming increasingly important in academia and industry alike. They are constantly extended with new data, powered by modern information extraction tools that associate probabilities with database tuples. In this paper, we revisit the semantics underlying such systems. In particular, the closed-world assumption of probabilistic databases, that facts not in the database have probability zero, clearly conflicts with their everyday use. To address this discrepancy, we propose an open-world probabilistic database semantics, which relaxes the probabilities of open facts to intervals. While still assuming a finite domain, this semantics can provide meaningful answers when some probabilities are not precisely known. For this open world setting, we propose an efficient evaluation algorithm for unions of conjunctive queries. Our open-world algorithm incurs no overhead compared to closed-world reasoning and runs in time linear in the size of the database for tractable queries. All other queries are #P-hard, implying a data complexity dichotomy between linear time and #P. For queries involving negation, however, open-world reasoning can become NP-, or even NP^PP-hard. Finally, we discuss additional knowledge representation layers that can further strengthen open-world reasoning about big uncertain data.
Increasing amounts of available data have led to a heightened need for representing large-scale probabilistic knowledge bases. One approach is to use a probabilistic database, a model with strong assumptions that allow for efficiently answering many interesting queries. Recent work on open-world probabilistic databases strengthens the semantics of these probabilistic databases by discarding the assumption that any information not present in the data must be false. While intuitive, these semantics are not sufficiently precise to give reasonable answers to queries. We propose overcoming these issues by using constraints to restrict this open world. We provide an algorithm for one class of queries, and establish a basic hardness result for another. Finally, we propose an efficient and tight approximation for a large class of queries.
Probabilistic databases (PDBs) are usually incomplete, e.g., containing only the facts that have been extracted from the Web with high confidence. However, missing facts are often treated as being false, which leads to unintuitive results when querying PDBs. Recently, open-world probabilistic databases (OpenPDBs) were proposed to address this issue by allowing probabilities of unknown facts to take any value from a fixed probability interval. In this paper, we extend OpenPDBs by Datalog /- ontologies, under which both upper and lower probabilities of queries become even more informative, enabling us to distinguish queries that were indistinguishable before. We show that the dichotomy between P and PP in (Open)PDBs can be lifted to the case of first-order rewritable positive programs (without negative constraints); and that the problem can become NP PP-complete, once negative constraints are allowed. We also propose an approximating semantics that circumvents the increase in complexity caused by negative constraints.
Lukasiewicz, Thomas (University of Oxford) | Martinez, Maria Vanina (Universidad Nacional del Sur-CONICET) | Predoiu, Livia (University of Oxford) | Simari, Gerardo I. (Universidad Nacional del Sur-CONICET)
We study the complexity of exchanging probabilistic data between ontology-based probabilistic databases. We consider the Datalog+/- family of languages as ontology and ontology mapping languages, and we assume different compact encodings of the probabilities of the probabilistic source databases via Boolean events. We provide an extensive complexity analysis of the problem of deciding the existence of a probabilistic (universal) solution for a given probabilistic source database relative to a (probabilistic) data exchange problem for the different languages considered.
A robust system for ontology-based data access should provide meaningful answers to queries even when the data conflicts with the ontology. This can be accomplished by adopting an inconsistency-tolerant semantics, with the consistent query answering (CQA) semantics being the most prominent example. Unfortunately, query answering under the CQA semantics has been shown to be computationally intractable, even when extremely simple ontology languages are considered. In this paper, we address this problem by proposing two new families of inconsistency-tolerant semantics which approximate the CQA semantics from above and from below and converge to it in the limit. We study the data complexity of conjunctive query answering under these new semantics, and show a general tractability result for all known first-order rewritable ontology languages. We also analyze the combined complexity of query answering for ontology languages of the DL-Lite family.