Open-World Probabilistic Databases

Ceylan, Ismail Ilkan (Technische Universität Dresden) | Darwiche, Adnan (University of California, Los Angeles) | Broeck, Guy Van den (University of California, Los Angeles)

AAAI Conferences 

Large-scale probabilistic knowledge bases are becoming increasingly important in academia and industry alike. They are constantly extended with new data, powered by modern information extraction tools that associate probabilities with database tuples. In this paper, we revisit the semantics underlying such systems. In particular, the closed-world assumption of probabilistic databases, that facts not in the database have probability zero, clearly conflicts with their everyday use. To address this discrepancy, we propose an open-world probabilistic database semantics, which relaxes the probabilities of open facts to intervals. While still assuming a finite domain, this semantics can provide meaningful answers when some probabilities are not precisely known. For this open world setting, we propose an efficient evaluation algorithm for unions of conjunctive queries. Our open-world algorithm incurs no overhead compared to closed-world reasoning and runs in time linear in the size of the database for tractable queries. All other queries are #P-hard, implying a data complexity dichotomy between linear time and #P. For queries involving negation, however, open-world reasoning can become NP-, or even NP^PP-hard. Finally, we discuss additional knowledge representation layers that can further strengthen open-world reasoning about big uncertain data.