Scalable Score Computation for Learning Multinomial Bayesian Networks over Distributed Data
Rao, Praveen (University of Missouri-Kansas City) | Katib, Anas (University of Missouri-Kansas City) | Barnard, Kobus (University of Arizona) | Kamhoua, Charles (Air Force Research Lab) | Kwiat, Kevin (Air Force Research Lab) | Njilla, Laurent (Air Force Research Lab)
In this paper, we focus on the problem of learning a Bayesian network over distributed data stored in a commodity cluster. Specifically, we address the challenge of computing the scoring function over distributed data in a scalable manner, which is a fundamental task during learning. We propose a novel approach designed to achieve: (a) scalable score computation using the principle of gossiping; (b) lower resource consumption via a probabilistic approach for maintaining scores using the properties of a Markov chain; and (c) effective distribution of tasks during score computation (on large datasets) by synergistically combining well-known hashing techniques. Through theoretical analysis, we show that our approach is superior to a MapReduce-style computation in terms of communication bandwidth. Further, it is superior to the batch-style processing of MapReduce for recomputing scores when new data are available.
Feb-4-2017
- Country:
- North America > United States > Missouri (0.14)
- Industry:
- Government > Military (0.46)
- Information Technology (0.46)