Goto

Collaborating Authors

 Markov Models


Markov models for ocular fixation locations in the presence and absence of colour

arXiv.org Machine Learning

We propose to model the fixation locations of the human eye when observing a still image by a Markovian point process in R 2 . Our approach is data driven using k-means clustering of the fixation locations to identify distinct salient regions of the image, which in turn correspond to the states of our Markov chain. Bayes factors are computed as model selection criterion to determine the number of clusters. Furthermore, we demonstrate that the behaviour of the human eye differs from this model when colour information is removed from the given image.


Deep Learning in Neural Networks: An Overview

#artificialintelligence

What a wonderful treasure trove this paper is! Schmidhuber provides all the background you need to gain an overview of deep learning (as of 2014) and how we got there through the preceding decades. Starting from recent DL results, I tried to trace back the origins of relevant ideas through the past half century and beyond. The main part of the paper runs to 35 pages, and then there are 53 pages of references. Now, I know that many of you think I read a lot of papers – just over 200 a year on this blog – but if I did nothing but review these key works in the development of deep learning it would take me about 4.5 years to get through them at that rate! And when I'd finished I'd still be about 6 years behind the then current state of the art!


Elements of machine learning

@machinelearnbot

The official title of this free book available in PDF format is Machine Learning Cheat Sheet. See table of content screenshot below. The chapters 17 to 28 (the most interesting ones in my opinion) seem like a work in progress - I'm sure the authors intend to make them a bit bigger. For a more modern and applied book, get Dr Granville's book on data science.


Probabilistic Models over Weighted Orderings: Fixed-Parameter Tractable Variable Elimination

AAAI Conferences

Probabilistic models with weighted formulas, known as Markov models or log-linear models, are used in many domains. Recent models of weighted orderings between elements that have been proposed as flexible tools to express preferences under uncertainty, are also potentially useful in applications like planning, temporal reasoning, and user modeling. Their computational properties are very different from those of conventional Markov models; because of the transitivity of the “less than” relation, standard methods that exploit structure of the models, such as variable elimination, are not directly applicable, as there are no conditional independencies between the orderings within connected components. The best known algorithms for general inference inthese models are exponential in the number of statements. Here, we present the first algorithms that exploit the available structure. We begin with the special case of models in the form of chains; we present an exact O(n^3) algorithm, where n is the total number of elements. Next, we generalize this technique to models in which the set of statements are comprised of arbitrary sets of atomic weighted preference formulas (while the query and evidence are conjunctions of atomic preference formulas), and the resulting exact algorithm runs in time O(m * n^2 * n^c), where m is the number of preference formulas, n is the number of elements, and c is the maximum number of elements in a linear cut (which depends both on the structure of the model and the order in which the elements are processed)—therefore, this algorithm is tractable for cases in which c can be bounded to a low value. Finally, we report on the results of an empirical evaluation of both algorithms, showing how they scale with reasonably-sized models.


Weighted Rules under the Stable Model Semantics

AAAI Conferences

We introduce the concept of weighted rules under the stable model semantics following the log-linear models of Markov Logic. This provides versatile methods to overcome the deterministic nature of the stable model semantics, such as resolving inconsistencies in answer set programs, ranking stable models, associating probability to stable models, and applying statistical inference to computing weighted stable models. We also present formal comparisons with related formalisms, such as answer set programs, Markov Logic, ProbLog, and P-log.


Mobility Sequence Extraction and Labeling Using Sparse Cell Phone Data

AAAI Conferences

Human mobility modeling for either transportation system development or individual location based services has a tangible impact on people's everyday experience. In recent years cell phone data has received a lot of attention as a promising data source because of the wide coverage, long observation period, and low cost. The challenge in utilizing such data is how to robustly extract people's trip sequences from sparse and noisy cell phone data and endow the extracted trips with semantic meaning, i.e., trip purposes.In this study we reconstruct trip sequences from sparse cell phone records. Next we propose a Bayesian trip purpose classification method and compare it to a Markov random field based trip purpose clustering method, representing scenarios with and without labeled training data respectively. This procedure shows how the cell phone data, despite their coarse granularity and sparsity, can be turned into a low cost, long term, and ubiquitous sensor network for mobility related services.


A POMDP Formulation of Proactive Learning

AAAI Conferences

We cast the Proactive Learning (PAL) problem—Active Learning (AL) with multiple reluctant, fallible, cost-varying oracles—as a Partially Observable Markov Decision Process (POMDP). The agent selects an oracle at each time step to label a data point, while it maintains a belief over the true underlying correctness of its current dataset’s labels. The goal is to minimize labeling costs while considering the value of obtaining correct labels, thus maximizing final resultant classifier accuracy. We prove three properties that show our particular formulation leads to a structured and bounded-size set of belief points, enabling strong performance of point-based methods to solve the POMDP. Our method is compared with the original three algorithms proposed by Donmez and Carbonell and a simple baseline. We demonstrate that our approach matches or improves upon the original approach within five different oracle scenarios, each on two datasets. Finally, our algorithm provides a general, well-defined mathematical foundation to build upon.


Global Distant Supervision for Relation Extraction

AAAI Conferences

Machine learning approaches to relation extraction are typically supervised and require expensive labeled data. To break the bottleneck of labeled data, a promising approach is to exploit easily obtained indirect supervision knowledge – which we usually refer to as distant supervision (DS). However, traditional DS methods mostly only exploit one specific kind of indirect supervision knowledge – the relations/facts in a given knowledge base, thus often suffer from the problem of lack of supervision. In this paper, we propose a global distant supervision model for relation extraction, which can: 1) compensate the lack of supervision with a wide variety of indirect supervision knowledge; and 2) reduce the uncertainty in DS by performing joint inference across relation instances. Experimental results show that, by exploiting the consistency between relation labels, the consistency between relations and arguments, and the consistency between neighbor instances using Markov logic, our method significantly outperforms traditional DS approaches.


A Probabilistic Approach to Knowledge Translation

AAAI Conferences

In this paper, we focus on a novel knowledge reuse scenario where the knowledge in the source schema needs to be translated to a semantically heterogeneous target schema. We refer to this task as “knowledge translation” (KT). Unlike data translation and transfer learning, KT does not require any data from the source or target schema. We adopt a probabilistic approach to KT by representing the knowledge in the source schema, the mapping between the source and target schemas, and the resulting knowledge in the target schema all as probability distributions, specially using Markov random fields and Markov logic networks. Given the source knowledge and mappings, we use standard learning and inference algorithms for probabilistic graphical models to find an explicit probability distribution in the target schema that minimizes the Kullback-Leibler divergence from the implicit distribution. This gives us a compact probabilistic model that represents knowledge from the source schema as well as possible, respecting the uncertainty in both the source knowledge and the mapping. In experiments on both propositional and relational domains, we find that the knowledge obtained by KT is comparable to other approaches that require data, demonstrating that knowledge can be reused without data.


Incremental Stochastic Factorization for Online Reinforcement Learning

AAAI Conferences

A construct that has been receiving attention recently in reinforcement learning is stochastic factorization (SF), a particular case of non-negative factorization (NMF) in which the matrices involved are stochastic. The idea is to use SF to approximate the transition matrices of a Markov decision process (MDP). This is useful for two reasons. First, learning the factors of the SF instead of the transition matrices can reduce significantly the number of parameters to be estimated. Second, it has been shown that SF can be used to reduce the number of operations needed to compute an MDP's value function. Recently, an algorithm called expectation-maximization SF (EMSF) has been proposed to compute a SF directly from transitions sampled from an MDP. In this paper we take a closer look at EMSF. First, by exploiting the assumptions underlying the algorithm, we show that it is possible to reduce it to simple multiplicative update rules similar to the ones that helped popularize NMF. Second, we analyze the optimization process underlying EMSF and find that it minimizes a modified version of the Kullback-Leibler divergence that is particularly well-suited for learning a SF from data sampled from an arbitrary distribution. Third, we build on this improved understanding of EMSF to draw an interesting connection with NMF and probabilistic latent semantic analysis. We also exploit the simplified update rules to introduce a new version of EMSF that generalizes and significantly improves its precursor. This new algorithm provides a practical mechanism to control the trade-off between memory usage and computing time, essentially freeing the space complexity of EMSF from its dependency on the number of sample transitions. The algorithm can also compute its approximation incrementally, which makes it possible to use it concomitantly with the collection of data. This feature makes the new version of EMSF particularly suitable for online reinforcement learning. Empirical results support the utility of the proposed algorithm.