Learning Graphical Models
An ABC interpretation of the multiple auxiliary variable method
Prangle, Dennis, Everitt, Richard G.
Markov random fields (MRFs) have densities of the form f(y θ) γ(y θ)/Z(θ), (1) where γ(y θ) can be evaluated numerically but Z(θ) cannot in a reasonable time. This makes it challenging to perform inference. This note considers two approaches which both use simulation from f(y θ). The single auxiliary variable (SAV) method (Møller et al., 2006) and the multiple auxiliary variable (MAV) method (Murray et al., 2006) provide unbiased likelihood estimates. Approximate Bayesian computation (Marin et al., 2012) finds parameters which produce simulations similar to the observed data.
Scalable Discrete Sampling as a Multi-Armed Bandit Problem
Chen, Yutian, Ghahramani, Zoubin
Drawing a sample from a discrete distribution is one of the building components for Monte Carlo methods. Like other sampling algorithms, discrete sampling suffers from the high computational burden in large-scale inference problems. We study the problem of sampling a discrete random variable with a high degree of dependency that is typical in large-scale Bayesian inference and graphical models, and propose an efficient approximate solution with a subsampling approach. We make a novel connection between the discrete sampling and Multi-Armed Bandits problems with a finite reward population and provide three algorithms with theoretical guarantees. Empirical evaluations show the robustness and efficiency of the approximate algorithms in both synthetic and real-world large-scale problems.
Probabilistic Graphical Models on Multi-Core CPUs using Java 8
Masegosa, Andres R., Martinez, Ana M., Borchani, Hanen
In this paper, we discuss software design issues related to the development of parallel computational intelligence algorithms on multi-core CPUs, using the new Java 8 functional programming features. In particular, we focus on probabilistic graphical models (PGMs) and present the parallelisation of a collection of algorithms that deal with inference and learning of PGMs from data. Namely, maximum likelihood estimation, importance sampling, and greedy search for solving combinatorial optimisation problems. Through these concrete examples, we tackle the problem of defining efficient data structures for PGMs and parallel processing of same-size batches of data sets using Java 8 features. We also provide straightforward techniques to code parallel algorithms that seamlessly exploit multi-core processors. The experimental analysis, carried out using our open source AMIDST (Analysis of MassIve Data STreams) Java toolbox, shows the merits of the proposed solutions.
5 skills You Need to Become a Machine Learning Engineer
The world is unquestionably changing in rapid and dramatic ways, and the demand for Machine Learning engineers is going to keep increasing exponentially. Now undoubtedly Machine Learning has arrived. To begin, there are two very important things that you should understand if you're considering a career as a Machine Learning engineer. You don't necessarily have to have a research or academic background. Second, it's not enough to have either software engineering or data science experience.
Data Science Learning Club Update
For anyone that hasn't yet joined the Becoming a Data Scientist Podcast Data Science Learning Club, I thought I'd write up a summary of what we've been doing! The first activity involved setting up a development environment. Some people are using R, some using python, and there are several different development tools represented. In this thread, several people posted what setup they were using. I posted a "hello world" program and the code to output the package versions.
Mixtures of Sparse Autoregressive Networks
We consider high-dimensional distribution estimation through autoregressive networks. By combining the concepts of sparsity, mixtures and parameter sharing we obtain a simple model which is fast to train and which achieves state-of-the-art or better results on several standard benchmark datasets. Specifically, we use an L1-penalty to regularize the conditional distributions and introduce a procedure for automatic parameter sharing between mixture components. Moreover, we propose a simple distributed representation which permits exact likelihood evaluations since the latent variables are interleaved with the observable variables and can be easily integrated out. Our model achieves excellent generalization performance and scales well to extremely high dimensions.
Train and Test Tightness of LP Relaxations in Structured Prediction
Meshi, Ofer, Mahdavi, Mehrdad, Weller, Adrian, Sontag, David
Structured prediction is used in areas such as computer vision and natural language processing to predict structured outputs such as segmentations or parse trees. In these settings, prediction is performed by MAP inference or, equivalently, by solving an integer linear program. Because of the complex scoring functions required to obtain accurate predictions, both learning and inference typically require the use of approximate solvers. We propose a theoretical explanation to the striking observation that approximations based on linear programming (LP) relaxations are often tight on real-world instances. In particular, we show that learning with LP relaxed inference encourages integrality of training instances, and that tightness generalizes from train to test data.
Top Data Mining Algorithms Identified by IEEE & Related Python Resources
IEEE International Conference on Data Mining identified 10 algorithms in 2006 using surveys from past winners and voting. This is a list of those algorithms a short description and related python resources. The detailed paper is given here. C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referred to as a statistical classifier.
Exploiting Causality for Selective Belief Filtering in Dynamic Bayesian Networks
Albrecht, Stefano V., Ramamoorthy, Subramanian
Dynamic Bayesian networks (DBNs) are a general model for stochastic processes with partially observed states. Belief filtering in DBNs is the task of inferring the belief state (i.e. the probability distribution over process states) based on incomplete and noisy observations. This can be a hard problem in complex processes with large state spaces. In this article, we explore the idea of accelerating the filtering task by automatically exploiting causality in the process. We consider a specific type of causal relation, called passivity, which pertains to how state variables cause changes in other variables. We present the Passivity-based Selective Belief Filtering (PSBF) method, which maintains a factored belief representation and exploits passivity to perform selective updates over the belief factors. PSBF produces exact belief states under certain assumptions and approximate belief states otherwise, where the approximation error is bounded by the degree of uncertainty in the process. We show empirically, in synthetic processes with varying sizes and degrees of passivity, that PSBF is faster than several alternative methods while achieving competitive accuracy. Furthermore, we demonstrate how passivity occurs naturally in a complex system such as a multi-robot warehouse, and how PSBF can exploit this to accelerate the filtering task.