Scalable Score Computation for Learning Multinomial Bayesian Networks over Distributed Data

Rao, Praveen (University of Missouri-Kansas City) | Katib, Anas (University of Missouri-Kansas City) | Barnard, Kobus (University of Arizona) | Kamhoua, Charles (Air Force Research Lab) | Kwiat, Kevin (Air Force Research Lab) | Njilla, Laurent (Air Force Research Lab)

AAAI Conferences 

In this paper, we focus on the problem of learning a Bayesian network over distributed data stored in a commodity cluster. Specifically, we address the challenge of computing the scoring function over distributed data in a scalable manner, which is a fundamental task during learning. We propose a novel approach designed to achieve: (a) scalable score computation using the principle of gossiping; (b) lower resource consumption via a probabilistic approach for maintaining scores using the properties of a Markov chain; and (c) effective distribution of tasks during score computation (on large datasets) by synergistically combining well-known hashing techniques. Through theoretical analysis, we show that our approach is superior to a MapReduce-style computation in terms of communication bandwidth. Further, it is superior to the batch-style processing of MapReduce for recomputing scores when new data are available.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found