Motik, Boris
Goal-Driven Query Answering over First- and Second-Order Dependencies with Equality
Tsamoura, Efthymia, Motik, Boris
Query answering over data with dependencies plays a central role in most applications of dependencies. The problem is commonly solved by using a suitable variant of the chase algorithm to compute a universal model of the dependencies and the data and thus explicate all knowledge implicit in the dependencies. After this preprocessing step, an arbitrary conjunctive query over the dependencies and the data can be answered by evaluating it the computed universal model. If, however, the query to be answered is fixed and known in advance, computing the universal model is often inefficient as many inferences made during this process can be irrelevant to a given query. In such cases, a goal-driven approach, which avoids drawing unnecessary inferences, promises to be more efficient and thus preferable in practice. In this paper we present what we believe to be the first technique for goal-driven query answering over first- and second-order dependencies with equality reasoning. Our technique transforms the input dependencies so that applying the chase to the output avoids many inferences that are irrelevant to the query. The transformation proceeds in several steps, which comprise the following three novel techniques. First, we present a variant of the singularisation technique by Marnette [60] that is applicable to second-order dependencies and that corrects an incompleteness of a related formulation by ten Cate et al. [74]. Second, we present a relevance analysis technique that can eliminate from the input dependencies that provably do not contribute to query answers. Third, we present a variant of the magic sets algorithm [19] that can handle second-order dependencies with equality reasoning. We also present the results of an extensive empirical evaluation, which show that goal-driven query answering can be orders of magnitude faster than computing the full universal model.
On the Correspondence Between Monotonic Max-Sum GNNs and Datalog
Cucala, David Tena, Grau, Bernardo Cuenca, Motik, Boris, Kostylev, Egor V.
Although there has been significant interest in applying machine learning techniques to structured data, the expressivity (i.e., a description of what can be learned) of such techniques is still poorly understood. In this paper, we study data transformations based on graph neural networks (GNNs). First, we note that the choice of how a dataset is encoded into a numeric form processable by a GNN can obscure the characterisation of a model's expressivity, and we argue that a canonical encoding provides an appropriate basis. Second, we study the expressivity of monotonic max-sum GNNs, which cover a subclass of GNNs with max and sum aggregation functions. We show that, for each such GNN, one can compute a Datalog program such that applying the GNN to any dataset produces the same facts as a single round of application of the program's rules to the dataset. Monotonic max-sum GNNs can sum an unbounded number of feature vectors which can result in arbitrarily large feature values, whereas rule application requires only a bounded number of constants. Hence, our result shows that the unbounded summation of monotonic max-sum GNNs does not increase their expressive power. Third, we sharpen our result to the subclass of monotonic max GNNs, which use only the max aggregation function, and identify a corresponding class of Datalog programs.
Datalog Reasoning over Compressed RDF Knowledge Bases
Hu, Pan, Urbani, Jacopo, Motik, Boris, Horrocks, Ian
Materialisation is often used in RDF systems as a preprocessing step to derive all facts implied by given RDF triples and rules. Although widely used, materialisation considers all possible rule applications and can use a lot of memory for storing the derived facts, which can hinder performance. We present a novel materialisation technique that compresses the RDF triples so that the rules can sometimes be applied to multiple facts at once, and the derived facts can be represented using structure sharing. Our technique can thus require less space, as well as skip certain rule applications. Our experiments show that our technique can be very effective: when the rules are relatively simple, our system is both faster and requires less memory than prominent state-of-the-art RDF systems.
Consequence-Based Reasoning for Description Logics with Disjunctions and Number Restrictions
Bate, Andrew, Motik, Boris, Cuenca Grau, Bernardo, Tena Cucala, David, Simančík, František, Horrocks, Ian
Classification of description logic (DL) ontologies is a key computational problem in modern data management applications, so considerable effort has been devoted to the development and optimisation of practical reasoning calculi. Consequence-based calculi combine ideas from hypertableau and resolution in a way that has proved very effective in practice. However, existing consequence-based calculi can handle either Horn DLs (which do not support disjunction) or DLs without number restrictions. In this paper, we overcome this important limitation and present the first consequence-based calculus for deciding concept subsumption in the DL ALCHIQ+. Our calculus runs in exponential time assuming unary coding of numbers, and on ELH ontologies it runs in polynomial time. The extension to disjunctions and number restrictions is technically involved: we capture the relevant consequences using first-order clauses, and our inference rules adapt paramodulation techniques from first-order theorem proving. By using a well-known preprocessing step, the calculus can also decide concept subsumptions in SRIQ---a rich DL that covers all features of OWL 2 DL apart from nominals and datatypes. We have implemented our calculus in a new reasoner called Sequoia. We present the architecture of our reasoner and discuss several novel and important implementation techniques such as clause indexing and redundancy elimination. Finally, we present the results of an extensive performance evaluation, which revealed Sequoia to be competitive with existing reasoners. Thus, the calculus and the techniques we present in this paper provide an important addition to the repertoire of practical implementation techniques for description logic reasoning.
Stream Reasoning in Temporal Datalog
Ronca, Alessandro, Kaminski, Mark, Grau, Bernardo Cuenca, Motik, Boris, Horrocks, Ian
In recent years, there has been an increasing interest in extending traditional stream processing engines with logical, rule-based, reasoning capabilities. This poses significant theoretical and practical challenges since rules can derive new information and propagate it both towards past and future time points; as a result, streamed query answers can depend on data that has not yet been received, as well as on data that arrived far in the past. Stream reasoning algorithms, however, must be able to stream out query answers as soon as possible, and can only keep a limited number of previous input facts in memory. In this paper, we propose novel reasoning problems to deal with these challenges, and study their computational properties on Datalog extended with a temporal sort and the successor function (a core rule-based language for stream reasoning applications).
Modular Materialisation of Datalog Programs
Hu, Pan, Motik, Boris, Horrocks, Ian
The semina\"ive algorithm can materialise all consequences of arbitrary datalog rules, and it also forms the basis for incremental algorithms that update a materialisation as the input facts change. Certain (combinations of) rules, however, can be handled much more efficiently using custom algorithms. To integrate such algorithms into a general reasoning approach that can handle arbitrary rules, we propose a modular framework for materialisation computation and its maintenance. We split a datalog program into modules that can be handled using specialised algorithms, and handle the remaining rules using the semina\"ive algorithm. We also present two algorithms for computing the transitive and the symmetric-transitive closure of a relation that can be used within our framework. Finally, we show empirically that our framework can handle arbitrary datalog programs while outperforming existing approaches, often by orders of magnitude.
Stratified Negation in Limit Datalog Programs
Kaminski, Mark, Grau, Bernardo Cuenca, Kostylev, Egor V., Motik, Boris, Horrocks, Ian
There has recently been an increasing interest in declarative data analysis, where analytic tasks are specified using a logical language, and their implementation and optimisation are delegated to a general-purpose query engine. Existing declarative languages for data analysis can be formalised as variants of logic programming equipped with arithmetic function symbols and/or aggregation, and are typically undecidable. In prior work, the language of $\mathit{limit\ programs}$ was proposed, which is sufficiently powerful to capture many analysis tasks and has decidable entailment problem. Rules in this language, however, do not allow for negation. In this paper, we study an extension of limit programs with stratified negation-as-failure. We show that the additional expressive power makes reasoning computationally more demanding, and provide tight data complexity bounds. We also identify a fragment with tractable data complexity and sufficient expressivity to capture many relevant tasks.
Goal-Driven Query Answering for Existential Rules With Equality
Benedikt, Michael (Oxford University) | Motik, Boris (Oxford University) | Tsamoura, Efthymia (Alan Turing Institute, Oxford University)
Inspired by the magic sets for Datalog, we present a novel goal-driven approach for answering queries over terminating existential rules with equality (aka TGDs and EGDs). Our technique improves the performance of query answering by pruning the consequences that are not relevant for the query. This is challenging in our setting because equalities can potentially affect all predicates in a dataset. We address this problem by combining the existing singularization technique with two new ingredients: an algorithm for identifying the rules relevant to a query and a new magic sets algorithm. We show empirically that our technique can significantly improve the performance of query answering, and that it can mean the difference between answering a query in a few seconds or not being able to process the query at all.
Optimised Maintenance of Datalog Materialisations
Hu, Pan (University of Oxford) | Motik, Boris (University of Oxford) | Horrocks, Ian (University of Oxford)
To efficiently answer queries, datalog systems often materialise all consequences of a datalog program, so the materialisation must be updated whenever the input facts change. Several solutions to the materialisation update problem have been proposed. The Delete/Rederive (DRed) and the Backward/Forward (B/F) algorithms solve this problem for general datalog, but both contain steps that evaluate rules "backwards" by matching their heads to a fact and evaluating the partially instantiated rule bodies as queries. We show that this can be a considerable source of overhead even on very small updates. In contrast, the Counting algorithm does not evaluate the rules "backwards," but it can handle only nonrecursive rules. We present two hybrid approaches that combine DRed and B/F with Counting so as to reduce or even eliminate "backward" rule evaluation while still handling arbitrary datalog programs. We show empirically that our hybrid algorithms are usually significantly faster than existing approaches, sometimes by orders of magnitude.
Stream Reasoning in Temporal Datalog
Ronca, Alessandro (University of Oxford) | Kaminski, Mark (University of Oxford) | Grau, Bernardo Cuenca (University of Oxford) | Motik, Boris (University of Oxford) | Horrocks, Ian (University of Oxford)
Consider a number of wind turbines scattered throughout the North Sea. Each turbine is equipped with a Query processing over data streams is a key aspect of Big sensor, which continuously records temperature levels of key Data applications. For instance, algorithmic trading relies on devices within the turbine and sends those readings to a data real-time analysis of stock tickers and financial news items centre monitoring the functioning of the turbines. Temperature (Nuti et al. 2011); oil and gas companies continuously monitor levels are streamed by sensors using a ternary predicate and analyse data coming from their wellsites in order Temp, whose arguments identify the device, the temperature to detect equipment malfunction and predict maintenance level, and the time of the reading. A monitoring task in the needs (Cosad et al. 2009); network providers perform realtime data centre is to track the activation of cooling measures in analysis of network flow data to identify traffic anomalies each turbine, record temperature-induced malfunctions and and DoS attacks (Münz and Carle 2007).