join tree
InferF: Declarative Factorization of AI/ML Inferences over Joins
Chowdhury, Kanchan, Zhou, Lixi, Xie, Lulu, Fu, Xinwei, Zou, Jia
Real-world AI/ML workflows often apply inference computations to feature vectors joined from multiple datasets. To avoid the redundant AI/ML computations caused by repeated data records in the join's output, factorized ML has been proposed to decompose ML computations into sub-computations to be executed on each normalized dataset. However, there is insufficient discussion on how factorized ML could impact AI/ML inference over multi-way joins. To address the limitations, we propose a novel declarative InferF system, focusing on the factorization of arbitrary inference workflows represented as analyzable expressions over the multi-way joins. We formalize our problem to flexibly push down partial factorized computations to qualified nodes in the join tree to minimize the overall inference computation and join costs and propose two algorithms to resolve the problem: (1) a greedy algorithm based on a per-node cost function that estimates the influence on overall latency if a subset of factorized computations is pushed to a node, and (2) a genetic algorithm for iteratively enumerating and evaluating promising factorization plans. We implement InferF on Velox, an open-sourced database engine from Meta, evaluate it on real-world datasets, observed up to 11.3x speedups, and systematically summarized the factors that determine when factorized ML can benefit AI/ML inference workflows.
Selective Use of Yannakakis' Algorithm to Improve Query Performance: Machine Learning to the Rescue
Böhm, Daniela, Gottlob, Georg, Lanzinger, Matthias, Longo, Davide, Okulmus, Cem, Pichler, Reinhard, Selzer, Alexander
Query optimization has played a central role in database research for decades. However, more often than not, the proposed optimization techniques lead to a performance improvement in some, but not in all, situations. Therefore, we urgently need a methodology for designing a decision procedure that decides for a given query whether the optimization technique should be applied or not. In this work, we propose such a methodology with a focus on Yannakakis-style query evaluation as our optimization technique of interest. More specifically, we formulate this decision problem as an algorithm selection problem and we present a Machine Learning based approach for its solution. Empirical results with several benchmarks on a variety of database systems show that our approach indeed leads to a statistically significant performance improvement.
Towards Complete Causal Explanation with Expert Knowledge
Venkateswaran, Aparajithan, Perkovic, Emilija
We study the problem of restricting Markov equivalence classes of maximal ancestral graphs (MAGs) containing certain edge marks, which we refer to as expert knowledge. MAGs forming a Markov equivalence class can be uniquely represented by an essential ancestral graph. We seek to learn the restriction of the essential ancestral graph containing the proposed expert knowledge. Our contributions are several-fold. First, we prove certain properties for the entire Markov equivalence class including a conjecture from Ali et al. (2009). Second, we present three sound graphical orientation rules, two of which generalize previously known rules, for adding expert knowledge to an essential graph. We also show that some orientation rules of Zhang (2008) are not needed for restricting the Markov equivalence class with expert knowledge. We provide an algorithm for including this expert knowledge and show that our algorithm is complete in certain settings i.e., in these settings, the output of our algorithm is a restricted essential ancestral graph. We conjecture this algorithm is complete generally. Outside of our specified settings, we provide an algorithm for checking whether a graph is a restricted essential graph and discuss its runtime. This work can be seen as a generalization of Meek (1995).
Hype or Heuristic? Quantum Reinforcement Learning for Join Order Optimisation
Franz, Maja, Winker, Tobias, Groppe, Sven, Mauerer, Wolfgang
Identifying optimal join orders (JOs) stands out as a key challenge in database research and engineering. Owing to the large search space, established classical methods rely on approximations and heuristics. Recent efforts have successfully explored reinforcement learning (RL) for JO. Likewise, quantum versions of RL have received considerable scientific attention. Yet, it is an open question if they can achieve sustainable, overall practical advantages with improved quantum processors. In this paper, we present a novel approach that uses quantum reinforcement learning (QRL) for JO based on a hybrid variational quantum ansatz. It is able to handle general bushy join trees instead of resorting to simpler left-deep variants as compared to approaches based on quantum(-inspired) optimisation, yet requires multiple orders of magnitudes fewer qubits, which is a scarce resource even for post-NISQ systems. Despite moderate circuit depth, the ansatz exceeds current NISQ capabilities, which requires an evaluation by numerical simulations. While QRL may not significantly outperform classical approaches in solving the JO problem with respect to result quality (albeit we see parity), we find a drastic reduction in required trainable parameters. This benefits practically relevant aspects ranging from shorter training times compared to classical RL, less involved classical optimisation passes, or better use of available training data, and fits data-stream and low-latency processing scenarios. Our comprehensive evaluation and careful discussion delivers a balanced perspective on possible practical quantum advantage, provides insights for future systemic approaches, and allows for quantitatively assessing trade-offs of quantum approaches for one of the most crucial problems of database management systems.
IDEAL: A Software Package for Analysis of Influence Diagrams
Srinivas, Sampath, Breese, John S.
IDEAL (Influence Diagram Evaluation and Analysis in Lisp) is a software environment for creation and evaluation of belief networks and influence diagrams. IDEAL is primarily a research tool and provides an implementation of many of the latest developments in belief network and influence diagram evaluation in a unified framework. This paper describes IDEAL and some lessons learned during its development.
A Comparison of Lauritzen-Spiegelhalter, Hugin, and Shenoy-Shafer Architectures for Computing Marginals of Probability Distributions
Lepar, Vasilica, Shenoy, Prakash P.
In the last decade, several architectures have been proposed for exact computation of marginals using local computation. In this paper, we compare three architectures - Lauritzen-Spiegelhalter, Hugin, and Shenoy-Shafer - from the perspective of graphical structure for message propagation, message-passing scheme, computational efficiency, and storage efficiency.
Tree Projections and Structural Decomposition Methods: Minimality and Game-Theoretic Characterization
Greco, Gianluigi, Scarcello, Francesco
Tree projections provide a mathematical framework that encompasses all the various (purely) structural decomposition methods that have been proposed in the literature to single out classes of nearly-acyclic (hyper)graphs, such as the tree decomposition method, which is the most powerful decomposition method on graphs, and the (generalized) hypertree decomposition method, which is its natural counterpart on arbitrary hypergraphs. The paper analyzes this framework, by focusing in particular on "minimal" tree projections, that is, on tree projections without useless redundancies. First, it is shown that minimal tree projections enjoy a number of properties that are usually required for normal form decompositions in various structural decomposition methods. In particular, they enjoy the same kind of connection properties as (minimal) tree decompositions of graphs, with the result being tight in the light of the negative answer that is provided to the open question about whether they enjoy a slightly stronger notion of connection property, defined to speed-up the computation of hypertree decompositions. Second, it is shown that tree projections admit a natural game-theoretic characterization in terms of the Captain and Robber game. In this game, as for the Robber and Cops game characterizing tree decompositions, the existence of winning strategies implies the existence of monotone ones. As a special case, the Captain and Robber game can be used to characterize the generalized hypertree decomposition method, where such a game-theoretic characterization was missing and asked for. Besides their theoretical interest, these results have immediate algorithmic applications both for the general setting and for structural decomposition methods that can be recast in terms of tree projections.
Incremental Compilation of Bayesian networks
Flores, Julia M., Gamez, Jose A., Olesen, Kristian G.
Most methods of exact probability propagation in Bayesian networks do not carry out the inference directly over the network, but over a secondary structure known as a junction tree or a join tree (JT). The process of obtaining a JT is usually termed {sl compilation}. As compilation is usually viewed as a whole process; each time the network is modified, a new compilation process has to be carried out. The possibility of reusing an already existing JT, in order to obtain the new one regarding only the modifications in the network has received only little attention in the literature. In this paper we present a method for incremental compilation of a Bayesian network, following the classical scheme in which triangulation plays the key role. In order to perform incremental compilation we propose to recompile only those parts of the JT which can have been affected by the networks modifications. To do so, we exploit the technique OF maximal prime subgraph decomposition in determining the minimal subgraph(s) that have to be recompiled, and thereby the minimal subtree(s) of the JT that should be replaced by new subtree(s).We focus on structural modifications : addition and deletion of links and variables.
Join Tree Propagation Utilizing Both Arc Reversal and Variable Elimination
Butz, Cory James (University of Regina) | Konkel, Ken (University of Regina) | Lingras, Pawan (Saint Mary's University)
In this paper, we put forth the first join tree propagation algorithm that selectively applies either arc reversal (AR) or variable elimination (VE) to build the propagated messages. Our approach utilizes a recent method for identifying the propagated join tree messages \`{a} priori. When it is determined that precisely one message is to be constructed at a join tree node, VE is utilized to build this distribution; otherwise, AR is applied as it is better suited to construct multiple distributions passed between neighboring join tree nodes. Experimental results, involving evidence processing in seven real-world and one benchmark Bayesian network, empirically demonstrate that selectively applying VE and AR is faster than applying one of these methods exclusively on the entire network.
Logarithmic-Time Updates and Queries in Probabilistic Networks
Delcher, A. L., Grove, A. J., Kasif, S., Pearl, J.
Traditional databases commonly support efficient query and update procedures that operate in time which is sublinear in the size of the database. Our goal in this paper is to take a first step toward dynamic reasoning in probabilistic databases with comparable efficiency. We propose a dynamic data structure that supports efficient algorithms for updating and querying singly connected Bayesian networks. In the conventional algorithm, new evidence is absorbed in O(1) time and queries are processed in time O(N), where N is the size of the network. We propose an algorithm which, after a preprocessing phase, allows us to answer queries in time O(log N) at the expense of O(log N) time per evidence absorption. The usefulness of sub-linear processing time manifests itself in applications requiring (near) real-time response over large probabilistic databases. We briefly discuss a potential application of dynamic probabilistic reasoning in computational biology.