Goto

Collaborating Authors

 Learning Graphical Models


Feature Selection with the R Package MXM: Discovering Statistically-Equivalent Feature Subsets

arXiv.org Machine Learning

The statistically equivalent signature (SES) algorithm is a method for feature selection inspired by the principles of constrained-based learning of Bayesian Networks. Most of the currently available feature-selection methods return only a single subset of features, supposedly the one with the highest predictive power. We argue that in several domains multiple subsets can achieve close to maximal predictive accuracy, and that arbitrarily providing only one has several drawbacks. The SES method attempts to identify multiple, predictive feature subsets whose performances are statistically equivalent. Under that respect SES subsumes and extends previous feature selection algorithms, like the max-min parent children algorithm. SES is implemented in an homonym function included in the R package MXM, standing for mens ex machina, meaning 'mind from the machine' in Latin. The MXM implementation of SES handles several data-analysis tasks, namely classification, regression and survival analysis. In this paper we present the SES algorithm, its implementation, and provide examples of use of the SES function in R. Furthermore, we analyze three publicly available data sets to illustrate the equivalence of the signatures retrieved by SES and to contrast SES against the state-of-the-art feature selection method LASSO. Our results provide initial evidence that the two methods perform comparably well in terms of predictive accuracy and that multiple, equally predictive signatures are actually present in real world data.


k-nearest neighbor algorithm using Python

@machinelearnbot

In machine learning, you may often wish to build predictors that allows to classify things into categories based on some set of associated values. For example, it is possible to provide a diagnosis to a patient based on data from previous patients. Many algorithms have been developed for automated classification, and common ones include random forests, support vector machines, Naïve Bayes classifiers, and many types of neural networks. To get a feel for how classification works, we take a simple example of a classification algorithm – k-Nearest Neighbours (kNN) – and build it from scratch in Python 2. You can use a mostly imperative style of coding, rather than a declarative/functional one with lambda functions and list comprehensions to keep things simple if you are starting with Python. Here, we will provide an introduction to the latter approach.


RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning

arXiv.org Machine Learning

Deep reinforcement learning (deep RL) has been successful in learning sophisticated behaviors automatically; however, the learning process requires a huge number of trials. In contrast, animals can learn new tasks in just a few trials, benefiting from their prior knowledge about the world. This paper seeks to bridge this gap. Rather than designing a "fast" reinforcement learning algorithm, we propose to represent it as a recurrent neural network (RNN) and learn it from data. In our proposed method, RL$^2$, the algorithm is encoded in the weights of the RNN, which are learned slowly through a general-purpose ("slow") RL algorithm. The RNN receives all information a typical RL algorithm would receive, including observations, actions, rewards, and termination flags; and it retains its state across episodes in a given Markov Decision Process (MDP). The activations of the RNN store the state of the "fast" RL algorithm on the current (previously unseen) MDP. We evaluate RL$^2$ experimentally on both small-scale and large-scale problems. On the small-scale side, we train it to solve randomly generated multi-arm bandit problems and finite MDPs. After RL$^2$ is trained, its performance on new MDPs is close to human-designed algorithms with optimality guarantees. On the large-scale side, we test RL$^2$ on a vision-based navigation task and show that it scales up to high-dimensional problems.


Correlated Random Measures

arXiv.org Machine Learning

We develop correlated random measures, random measures where the atom weights can exhibit a flexible pattern of dependence, and use them to develop powerful hierarchical Bayesian nonparametric models. Hierarchical Bayesian nonparametric models are usually built from completely random measures, a Poisson-process based construction in which the atom weights are independent. Completely random measures imply strong independence assumptions in the corresponding hierarchical model, and these assumptions are often misplaced in real-world settings. Correlated random measures address this limitation. They model correlation within the measure by using a Gaussian process in concert with the Poisson process. With correlated random measures, for example, we can develop a latent feature model for which we can infer both the properties of the latent features and their dependency pattern. We develop several other examples as well. We study a correlated random measure model of pairwise count data. We derive an efficient variational inference algorithm and show improved predictive performance on large data sets of documents, web clicks, and electronic health records.


Computationally Efficient Target Classification in Multispectral Image Data with Deep Neural Networks

arXiv.org Artificial Intelligence

Detecting and classifying targets in video streams from surveillance cameras is a cumbersome, error-prone and expensive task. Often, the incurred costs are prohibitive for real-time monitoring. This leads to data being stored locally or transmitted to a central storage site for post-incident examination. The required communication links and archiving of the video data are still expensive and this setup excludes preemptive actions to respond to imminent threats. An effective way to overcome these limitations is to build a smart camera that transmits alerts when relevant video sequences are detected. Deep neural networks (DNNs) have come to outperform humans in visual classifications tasks. The concept of DNNs and Convolutional Networks (ConvNets) can easily be extended to make use of higher-dimensional input data such as multispectral data. We explore this opportunity in terms of achievable accuracy and required computational effort. To analyze the precision of DNNs for scene labeling in an urban surveillance scenario we have created a dataset with 8 classes obtained in a field experiment. We combine an RGB camera with a 25-channel VIS-NIR snapshot sensor to assess the potential of multispectral image data for target classification. We evaluate several new DNNs, showing that the spectral information fused together with the RGB frames can be used to improve the accuracy of the system or to achieve similar accuracy with a 3x smaller computation effort. We achieve a very high per-pixel accuracy of 99.1%. Even for scarcely occurring, but particularly interesting classes, such as cars, 75% of the pixels are labeled correctly with errors occurring only around the border of the objects. This high accuracy was obtained with a training set of only 30 labeled images, paving the way for fast adaptation to various application scenarios.


How Bayesian Inference Works

@machinelearnbot

Bayesian inference is a way to get sharper predictions from your data. It's particularly useful when you don't have as much data as you would like and want to juice every last bit of predictive strength from it. Although it is sometimes described with reverence, Bayesian inference isn't magic or mystical. And even though the math under the hood can get dense, the concepts behind it are completely accessible. In brief, Bayesian inference lets you draw stronger conclusions from your data by folding in what you already know about the answer. Bayesian inference is based on the ideas of Thomas Bayes, a nonconformist Presbyterian minister in London about 300 years ago. He wrote two books, one on theology, and one on probability. His work included his now famous Bayes Theorem in raw form, which has since been applied to the problem of inference, the technical term for educated guessing. The popularity of Bayes' ideas was aided immeasurably by another minister, Richard Price.


Data Science Dictionary

@machinelearnbot

The idea of cross-validation is to split the data into N subsets, to put one subset aside, to estimate parameters of the model from the remaining N-1 subsets, and to use the retained subset to estimate the error of the model. Such a process is repeated N times - with each of the N subsets being used as the validation set . Then the values of the errors obtained in such N steps are combined to provide the final estimate of the model error. The cross-validation is used in various classification and prediction procedures, such as regression analysis, discriminant analysis, neural networks and classification and regression trees (CART) . The goal is to improve the quality of the decision that is made from the outcome of the study on the basis of statistical methods, and to ensure that maximum information is obtained from scarce experimental data.


Introduction to Machine Learning for Developers

#artificialintelligence

Today's developers often hear about leveraging machine learning algorithms in order to build more intelligent applications, but many don't know where to start. One of the most important aspects of developing smart applications is to understand the underlying machine learning models, even if you aren't the person building them. Whether you are integrating a recommendation system into your app or building a chat bot, this guide will help you get started in understanding the basics of machine learning. This introduction to machine learning and list of resources is adapted from my October 2016 talk at ACT-W, a women's tech conference. While this is only a brief definition, machine learning means we can use statistical models and probabilistic algorithms to answer questions so we can make informative decisions based on our data.


Unifying Count-Based Exploration and Intrinsic Motivation

arXiv.org Machine Learning

We consider an agent's uncertainty about its environment and the problem of generalizing this uncertainty across observations. Specifically, we focus on the problem of exploration in non-tabular reinforcement learning. Drawing inspiration from the intrinsic motivation literature, we use density models to measure uncertainty, and propose a novel algorithm for deriving a pseudo-count from an arbitrary density model. This technique enables us to generalize count-based exploration algorithms to the non-tabular case. We apply our ideas to Atari 2600 games, providing sensible pseudo-counts from raw pixels. We transform these pseudo-counts into intrinsic rewards and obtain significantly improved exploration in a number of hard games, including the infamously difficult Montezuma's Revenge.


Using Social Dynamics to Make Individual Predictions: Variational Inference with a Stochastic Kinetic Model

arXiv.org Machine Learning

Social dynamics is concerned primarily with interactions among individuals and the resulting group behaviors, modeling the temporal evolution of social systems via the interactions of individuals within these systems. In particular, the availability of large-scale data from social networks and sensor networks offers an unprecedented opportunity to predict state-changing events at the individual level. Examples of such events include disease transmission, opinion transition in elections, and rumor propagation. Unlike previous research focusing on the collective effects of social systems, this study makes efficient inferences at the individual level. In order to cope with dynamic interactions among a large number of individuals, we introduce the stochastic kinetic model to capture adaptive transition probabilities and propose an efficient variational inference algorithm the complexity of which grows linearly --- rather than exponentially --- with the number of individuals. To validate this method, we have performed epidemic-dynamics experiments on wireless sensor network data collected from more than ten thousand people over three years. The proposed algorithm was used to track disease transmission and predict the probability of infection for each individual. Our results demonstrate that this method is more efficient than sampling while nonetheless achieving high accuracy.