Goto

Collaborating Authors

 Undirected Networks


Anytime State-Based Solution Methods for Decision Processes with non-Markovian Rewards

arXiv.org Artificial Intelligence

A popular approach to solving a decision process with non-Markovian rewards (NMRDP) is to exploit a compact representation of the reward function to automatically translate the NMRDP into an equivalent Markov decision process (MDP) amenable to our favorite MDP solution method. The contribution of this paper is a representation of non-Markovian reward functions and a translation into MDP aimed at making the best possible use of state-based anytime algorithms as the solution method. By explicitly constructing and exploring only parts of the state space, these algorithms are able to trade computation time for policy quality, and have proven quite effective in dealing with large MDPs. Our representation extends future linear temporal logic (FLTL) to express rewards. Our translation has the effect of embedding model-checking in the solution method. It results in an MDP of the minimal size achievable without stepping outside the anytime framework, and consequently in better policies by the deadline.


Decayed MCMC Filtering

arXiv.org Artificial Intelligence

Filtering---estimating the state of a partially observable Markov process from a sequence of observations---is one of the most widely studied problems in control theory, AI, and computational statistics. Exact computation of the posterior distribution is generally intractable for large discrete systems and for nonlinear continuous systems, so a good deal of effort has gone into developing robust approximation algorithms. This paper describes a simple stochastic approximation algorithm for filtering called {em decayed MCMC}. The algorithm applies Markov chain Monte Carlo sampling to the space of state trajectories using a proposal distribution that favours flips of more recent state variables. The formal analysis of the algorithm involves a generalization of standard coupling arguments for MCMC convergence. We prove that for any ergodic underlying Markov process, the convergence time of decayed MCMC with inverse-polynomial decay remains bounded as the length of the observation sequence grows. We show experimentally that decayed MCMC is at least competitive with other approximation algorithms such as particle filtering.


Value Function Approximation in Zero-Sum Markov Games

arXiv.org Artificial Intelligence

This paper investigates value function approximation in the context of zero-sum Markov games, which can be viewed as a generalization of the Markov decision process (MDP) framework to the two-agent case. We generalize error bounds from MDPs to Markov games and describe generalizations of reinforcement learning algorithms to Markov games. We present a generalization of the optimal stopping problem to a two-player simultaneous move Markov game. For this special problem, we provide stronger bounds and can guarantee convergence for LSTD and temporal difference learning with linear value function approximation. We demonstrate the viability of value function approximation for Markov games by using the Least squares policy iteration (LSPI) algorithm to learn good policies for a soccer domain and a flow control problem. 1 Introduction Markov games can be viewed as generalizations of both classical game theory and the Markov decision process (MDP) framework1. In this paper, we consider the twoplayer zero-sum case, in which two players make simultaneous decisions in the same environment with shared state information. The reward function and the state transition probabilities depend on the current state and the current agents' joint actions. The reward function in each state is the payoff matrix of a zero-sum game.


Reduction of Maximum Entropy Models to Hidden Markov Models

arXiv.org Artificial Intelligence

We show that maximum entropy (maxent) models can be modeled with certain kinds of HMMs, allowing us to construct maxent models with hidden variables, hidden state sequences, or other characteristics. The models can be trained using the forward-backward algorithm. While the results are primarily of theoretical interest, unifying apparently unrelated concepts, we also give experimental results for a maxent model with a hidden variable on a word disambiguation task; the model outperforms standard techniques.


Qualitative MDPs and POMDPs: An Order-Of-Magnitude Approximation

arXiv.org Artificial Intelligence

We develop a qualitative theory of Markov Decision Processes (MDPs) and Partially Observable MDPs that can be used to model sequential decision making tasks when only qualitative information is available. Our approach is based upon an order-of-magnitude approximation of both probabilities and utilities, similar to epsilon-semantics. The result is a qualitative theory that has close ties with the standard maximum-expected-utility theory and is amenable to general planning techniques.


Joint Training of Deep Boltzmann Machines

arXiv.org Machine Learning

We introduce a new method for training deep Boltzmann machines jointly. Prior methods require an initial learning pass that trains the deep Boltzmann machine greedily, one layer at a time, or do not perform well on classifi- cation tasks.


Multiscale Markov Decision Problems: Compression, Solution, and Transfer Learning

arXiv.org Artificial Intelligence

Many problems in sequential decision making and stochastic control often have natural multiscale structure: sub-tasks are assembled together to accomplish complex goals. Systematically inferring and leveraging hierarchical structure, particularly beyond a single level of abstraction, has remained a longstanding challenge. We describe a fast multiscale procedure for repeatedly compressing, or homogenizing, Markov decision processes (MDPs), wherein a hierarchy of sub-problems at different scales is automatically determined. Coarsened MDPs are themselves independent, deterministic MDPs, and may be solved using existing algorithms. The multiscale representation delivered by this procedure decouples sub-tasks from each other and can lead to substantial improvements in convergence rates both locally within sub-problems and globally across sub-problems, yielding significant computational savings. A second fundamental aspect of this work is that these multiscale decompositions yield new transfer opportunities across different problems, where solutions of sub-tasks at different levels of the hierarchy may be amenable to transfer to new problems. Localized transfer of policies and potential operators at arbitrary scales is emphasized. Finally, we demonstrate compression and transfer in a collection of illustrative domains, including examples involving discrete and continuous statespaces. Keywords: Markov decision processes, hierarchical reinforcement learning, transfer, multiscale analysis.


Random Input Sampling for Complex Models Using Markov Chain Monte Carlo

arXiv.org Machine Learning

Many random processes can be simulated as the output of a deterministic model accepting random inputs. Such a model usually describes a complex mathematical or physical stochastic system and the randomness is introduced in the input variables of the model. When the statistics of the output event are known, these input variables have to be chosen in a specific way for the output to have the prescribed statistics. Because the probability distribution of the input random variables is not directly known but dictated implicitly by the statistics of the output random variables, this problem is usually intractable for classical sampling methods. Based on Markov Chain Monte Carlo we propose a novel method to sample random inputs to such models by introducing a modification to the standard Metropolis-Hastings algorithm. As an example we consider a system described by a stochastic differential equation (sde) and demonstrate how sample paths of a random process satisfying this sde can be generated with our technique.


Sequence Transduction with Recurrent Neural Networks

arXiv.org Machine Learning

Many machine learning tasks can be expressed as the transformation---or \emph{transduction}---of input sequences into output sequences: speech recognition, machine translation, protein secondary structure prediction and text-to-speech to name but a few. One of the key challenges in sequence transduction is learning to represent both the input and output sequences in a way that is invariant to sequential distortions such as shrinking, stretching and translating. Recurrent neural networks (RNNs) are a powerful sequence learning architecture that has proven capable of learning such representations. However RNNs traditionally require a pre-defined alignment between the input and output sequences to perform transduction. This is a severe limitation since \emph{finding} the alignment is the most difficult aspect of many sequence transduction problems. Indeed, even determining the length of the output sequence is often challenging. This paper introduces an end-to-end, probabilistic sequence transduction system, based entirely on RNNs, that is in principle able to transform any input sequence into any finite, discrete output sequence. Experimental results for phoneme recognition are provided on the TIMIT speech corpus.


Learning and Detecting Patterns in Multi-Attributed Network Data

AAAI Conferences

Network analysis is a growing field across many domains, including computer vision, social media marketing, transportation networks, and intelligence analysis. The growing use of digital communication devices and platforms, as well as persistent surveillance sensors, has resulted in explosion of the quantity of data and stretched the abilities of current technologies to process this data and draw meaningful conclusions. Current tools either require significant levels of manual intervention (e.g., to prepare the data, to define patterns, or to draw conclusions from data) or are unable to generalize to new data sources and analysis needs. In this paper, we present automated solutions to two major problems in network analysis: (a) finding patterns in the network data that contains high levels of noise and irrelevant information; and (b) learning repetitive patterns and dependencies between entities and attributes. Our modeling framework represents network data using multi-attributed graphs that can encode various discrete and continuous features and relationships between network entities. The pattern search and learning model is based on probabilistic multi-attributed graph matching, and implemented using distributed message passing algorithms. Our algorithms achieved high accuracy rates in learning and finding patterns in the data, are flexible to new domains and data types, and scale to large datasets using the Map-Reduce framework.