Goto

Collaborating Authors

 Learning Graphical Models


The Bayesian SLOPE

arXiv.org Machine Learning

The SLOPE estimates regression coefficients by minimizing a regularized residual sum of squares using a sorted-$\ell_1$-norm penalty. The SLOPE combines testing and estimation in regression problems. It exhibits suitable variable selection and prediction properties, as well as minimax optimality. This paper introduces the Bayesian SLOPE procedure for linear regression. The classical SLOPE estimate is the posterior mode in the normal regression problem with an appropriate prior on the coefficients. The Bayesian SLOPE considers the full Bayesian model and has the advantage of offering credible sets and standard error estimates for the parameters. Moreover, the hierarchical Bayesian framework allows for full Bayesian and empirical Bayes treatment of the penalty coefficients; whereas it is not clear how to choose these coefficients when using the SLOPE on a general design matrix. A direct characterization of the posterior is provided which suggests a Gibbs sampler that does not involve latent variables. An efficient hybrid Gibbs sampler for the Bayesian SLOPE is introduced. Point estimation using the posterior mean is highlighted, which automatically facilitates the Bayesian prediction of future observations. These are demonstrated on real and synthetic data.


Statistical Properties of the Single Linkage Hierarchical Clustering Estimator

arXiv.org Machine Learning

Distance-based hierarchical clustering (HC) methods are widely used in unsupervised data analysis but few authors take account of uncertainty in the distance data. We incorporate a statistical model of the uncertainty through corruption or noise in the pairwise distances and investigate the problem of estimating the HC as unknown parameters from measurements. Specifically, we focus on single linkage hierarchical clustering (SLHC) and study its geometry. We prove that under fairly reasonable conditions on the probability distribution governing measurements, SLHC is equivalent to maximum partial profile likelihood estimation (MPPLE) with some of the information contained in the data ignored. At the same time, we show that direct evaluation of SLHC on maximum likelihood estimation (MLE) of pairwise distances yields a consistent estimator. Consequently, a full MLE is expected to perform better than SLHC in getting the correct HC results for the ground truth metric.


Making Data Science Accessible - Markov Chains

#artificialintelligence

A Markov chain is a random process with the property that the next state depends only on the current state. For example: If you have the choice of red or blue twice the process would be Markovian if each time you chose the decision had nothing to do with your choice previously (see diagram below). How can Markov Chains help us? To start with we need to define some basic terminology. The changes of state within the system are called transitions, and the probabilities associated with various state-changes are called transition probabilities.


The Three Faces of Bayes

#artificialintelligence

Last summer, I was at a conference having lunch with Hal Daume III when we got to talking about how "Bayesian" can be a funny and ambiguous term. It seems like the definition should be straightforward: "following the work of English mathematician Rev. Thomas Bayes," perhaps, or even "uses Bayes' theorem." But many methods bearing the reverend's name or using his theorem aren't even considered "Bayesian" by his most religious followers. Why is it that Bayesian networks, for example, aren't considered… y'know… Bayesian? As I've read more outside the fields of machine learning and natural language processing -- from psychometrics and environmental biology to hackers who dabble in data science -- I've noticed three distinct uses of the term "Bayesian."


Reinforcement Learning and DQN, learning to play from pixels - Ruben Fiszel's website

#artificialintelligence

My 2 month summer internship at Skymind (the company behind the open source deeplearning library DL4J) comes to an end and this is a post to summarize what I have been working on: Building a deep reinforcement learning library for DL4J: … (drums roll) … RL4J! This post begins by an introduction to reinforcement learning and is then followed by a detailed explanation of DQN (Deep Q-Network) for pixel inputs and is concluded by an RL4J example. I will assume from the reader some familiarity with neural networks. But first, lets talk about the core concepts of reinforcement learning. A "simple aspect of science" may be defined as one which, through good fortune, I happen to understand. Reinforcement Learning is an exciting area of machine learning. It is basically the learning of an efficient strategy in a given environment. Informally, this is very similar to Pavlovian conditioning: you assign a reward for a given behavior and over time, the agents learn to reproduce that behavior in order to receive more rewards. It is an iterative trial and error process. Formally, an environment is defined as a Markov Decision Process (MDP). Note: It is usually more convenient to use the set of Action \(A_s\) which is the set of available move from a given state, than the complete set A. \(A_s\) is simply the elements \(a\) in \(A\) such that \(P(s' s, a) 0\).


Superintelligence: Paths, Dangers, Strategies eBook: Nick Bostrom: Amazon.it: Kindle Store

#artificialintelligence

Prof. Bostrom has written a book that I believe will become a classic within that subarea of Artificial Intelligence (AI) concerned with the existential dangers that could threaten humanity as the result of the development of artificial forms of intelligence. What fascinated me is that Bostrom has approached the existential danger of AI from a perspective that, although I am an AI professor, I had never really examined in any detail. When I was a graduate student in the early 80s, studying for my PhD in AI, I came upon comments made in the 1960s (by AI leaders such as Marvin Minsky and John McCarthy) in which they mused that, if an artificially intelligent entity could improve its own design, then that improved version could generate an even better design, and so on, resulting in a kind of "chain-reaction explosion" of ever-increasing intelligence, until this entity would have achieved "superintelligence". This chain-reaction problem is the one that Bostrom focusses on. He sees three main paths to superintelligence: 1. The AI path -- In this path, all current (and future) AI technologies, such as machine learning, Bayesian networks, artificial neural networks, evolutionary programming, etc. are applied to bring about a superintelligence.


Visualizing and Understanding Sum-Product Networks

arXiv.org Machine Learning

Sum-Product Networks (SPNs) are recently introduced deep tractable probabilistic models by which several kinds of inference queries can be answered exactly and in a tractable time. Up to now, they have been largely used as black box density estimators, assessed only by comparing their likelihood scores only. In this paper we explore and exploit the inner representations learned by SPNs. We do this with a threefold aim: first we want to get a better understanding of the inner workings of SPNs; secondly, we seek additional ways to evaluate one SPN model and compare it against other probabilistic models, providing diagnostic tools to practitioners; lastly, we want to empirically evaluate how good and meaningful the extracted representations are, as in a classic Representation Learning framework. In order to do so we revise their interpretation as deep neural networks and we propose to exploit several visualization techniques on their node activations and network outputs under different types of inference queries. To investigate these models as feature extractors, we plug some SPNs, learned in a greedy unsupervised fashion on image datasets, in supervised classification learning tasks. We extract several embedding types from node activations by filtering nodes by their type, by their associated feature abstraction level and by their scope. In a thorough empirical comparison we prove them to be competitive against those generated from popular feature extractors as Restricted Boltzmann Machines. Finally, we investigate embeddings generated from random probabilistic marginal queries as means to compare other tractable probabilistic models on a common ground, extending our experiments to Mixtures of Trees.


Relevant based structure learning for feature selection

arXiv.org Machine Learning

Feature selection is an important task in many problems occurring in pattern recognition, bioinformatics, machine learning and data mining applications. The feature selection approach enables us to reduce the computation burden and the falling accuracy effect of dealing with huge number of features in typical learning problems. There is a variety of techniques for feature selection in supervised learning problems based on different selection metrics. In this paper, we propose a novel unified framework for feature selection built on the graphical models and information theoretic tools. The proposed approach exploits the structure learning among features to select more relevant and less redundant features to the predictive modeling problem according to a primary novel likelihood based criterion. In line with the selection of the optimal subset of features through the proposed method, it provides us the Bayesian network classifier without the additional cost of model training on the selected subset of features. The optimal properties of our method are established through empirical studies and computational complexity analysis. Furthermore the proposed approach is evaluated on a bunch of benchmark datasets based on the well-known classification algorithms. Extensive experiments confirm the significant improvement of the proposed approach compared to the earlier works.


An example Markov Decision Process model on setting rewards in a text sentence?

#artificialintelligence

I'm looking to build a MDP and a reinforcement program in Python which works on generating text based on reward points in training data which is text sentences. I could not find any basic example program models which gives idea on how to build MDP on text sentences training data or identify which reinforcement model to use to generate texts. Can you please provide some suggestions / advice.


On the Consistency of the Likelihood Maximization Vertex Nomination Scheme: Bridging the Gap Between Maximum Likelihood Estimation and Graph Matching

arXiv.org Machine Learning

Graphs are a common data modality, useful for modeling complex relationships between objects, with applications spanning fields as varied as biology (Jeong et al., 2001; Bullmore and Sporns, 2009), sociology (Wasserman and Faust, 1994), and computer vision (Foggia et al., 2014; Kandel et al., 2007), to name a few. For example, in neuroscience, vertices may be neurons and edges adjoin pairs of neurons that share a synapse (Bullmore and Sporns, 2009); in social networks, vertices may correspond to people and edges to friendships between them (Carrington et al., 2005; Yang and Leskovec, 2015); in computer vision, vertices may represent pixels in an image and edges may represent spatial proximity or multi-resolution mappings (Kandel et al., 2007). In many useful networks, vertices with similar attributes form densely-connected communities compared to vertices with highly disparate attributes, and uncovering these communities is an important step in understanding the structure of the network. There is an extensive literature devoted to uncovering this community structure in network data, including methods based on maximum modularity (Newman and Girvan, 2004; Newman, 2006b), spectral partitioning algorithms (Luxburg, 2007; Rohe et al., 2011; Sussman et al., 2012; Lyzinski et al., 2014b), and likelihood-based methods (Bickel and Chen, 2009), among others. In the setting of vertex nomination, one community in the network is of particular interest, and the inference task is to order the vertices into a nomination list with those vertices from the community of interest concentrating at the top of the list.