Markov Models
Value Function Approximation in Zero-Sum Markov Games
Lagoudakis, Michail, Parr, Ron
This paper investigates value function approximation in the context of zero-sum Markov games, which can be viewed as a generalization of the Markov decision process (MDP) framework to the two-agent case. We generalize error bounds from MDPs to Markov games and describe generalizations of reinforcement learning algorithms to Markov games. We present a generalization of the optimal stopping problem to a two-player simultaneous move Markov game. For this special problem, we provide stronger bounds and can guarantee convergence for LSTD and temporal difference learning with linear value function approximation. We demonstrate the viability of value function approximation for Markov games by using the Least squares policy iteration (LSPI) algorithm to learn good policies for a soccer domain and a flow control problem. 1 Introduction Markov games can be viewed as generalizations of both classical game theory and the Markov decision process (MDP) framework1. In this paper, we consider the twoplayer zero-sum case, in which two players make simultaneous decisions in the same environment with shared state information. The reward function and the state transition probabilities depend on the current state and the current agents' joint actions. The reward function in each state is the payoff matrix of a zero-sum game.
Reduction of Maximum Entropy Models to Hidden Markov Models
We show that maximum entropy (maxent) models can be modeled with certain kinds of HMMs, allowing us to construct maxent models with hidden variables, hidden state sequences, or other characteristics. The models can be trained using the forward-backward algorithm. While the results are primarily of theoretical interest, unifying apparently unrelated concepts, we also give experimental results for a maxent model with a hidden variable on a word disambiguation task; the model outperforms standard techniques.
Qualitative MDPs and POMDPs: An Order-Of-Magnitude Approximation
We develop a qualitative theory of Markov Decision Processes (MDPs) and Partially Observable MDPs that can be used to model sequential decision making tasks when only qualitative information is available. Our approach is based upon an order-of-magnitude approximation of both probabilities and utilities, similar to epsilon-semantics. The result is a qualitative theory that has close ties with the standard maximum-expected-utility theory and is amenable to general planning techniques.
Multiscale Markov Decision Problems: Compression, Solution, and Transfer Learning
Bouvrie, Jake, Maggioni, Mauro
Many problems in sequential decision making and stochastic control often have natural multiscale structure: sub-tasks are assembled together to accomplish complex goals. Systematically inferring and leveraging hierarchical structure, particularly beyond a single level of abstraction, has remained a longstanding challenge. We describe a fast multiscale procedure for repeatedly compressing, or homogenizing, Markov decision processes (MDPs), wherein a hierarchy of sub-problems at different scales is automatically determined. Coarsened MDPs are themselves independent, deterministic MDPs, and may be solved using existing algorithms. The multiscale representation delivered by this procedure decouples sub-tasks from each other and can lead to substantial improvements in convergence rates both locally within sub-problems and globally across sub-problems, yielding significant computational savings. A second fundamental aspect of this work is that these multiscale decompositions yield new transfer opportunities across different problems, where solutions of sub-tasks at different levels of the hierarchy may be amenable to transfer to new problems. Localized transfer of policies and potential operators at arbitrary scales is emphasized. Finally, we demonstrate compression and transfer in a collection of illustrative domains, including examples involving discrete and continuous statespaces. Keywords: Markov decision processes, hierarchical reinforcement learning, transfer, multiscale analysis.
Random Input Sampling for Complex Models Using Markov Chain Monte Carlo
Mahmutoglu, A. Gokcen, Erdogan, Alper T., Demir, Alper
Many random processes can be simulated as the output of a deterministic model accepting random inputs. Such a model usually describes a complex mathematical or physical stochastic system and the randomness is introduced in the input variables of the model. When the statistics of the output event are known, these input variables have to be chosen in a specific way for the output to have the prescribed statistics. Because the probability distribution of the input random variables is not directly known but dictated implicitly by the statistics of the output random variables, this problem is usually intractable for classical sampling methods. Based on Markov Chain Monte Carlo we propose a novel method to sample random inputs to such models by introducing a modification to the standard Metropolis-Hastings algorithm. As an example we consider a system described by a stochastic differential equation (sde) and demonstrate how sample paths of a random process satisfying this sde can be generated with our technique.
Sequence Transduction with Recurrent Neural Networks
Many machine learning tasks can be expressed as the transformation---or \emph{transduction}---of input sequences into output sequences: speech recognition, machine translation, protein secondary structure prediction and text-to-speech to name but a few. One of the key challenges in sequence transduction is learning to represent both the input and output sequences in a way that is invariant to sequential distortions such as shrinking, stretching and translating. Recurrent neural networks (RNNs) are a powerful sequence learning architecture that has proven capable of learning such representations. However RNNs traditionally require a pre-defined alignment between the input and output sequences to perform transduction. This is a severe limitation since \emph{finding} the alignment is the most difficult aspect of many sequence transduction problems. Indeed, even determining the length of the output sequence is often challenging. This paper introduces an end-to-end, probabilistic sequence transduction system, based entirely on RNNs, that is in principle able to transform any input sequence into any finite, discrete output sequence. Experimental results for phoneme recognition are provided on the TIMIT speech corpus.
Learning and Detecting Patterns in Multi-Attributed Network Data
Levchuk, Georgiy (Aptima, Inc.) | Roberts, Jennifer (Aptima, Inc.) | Freeman, Jared (Aptima, Inc.)
Network analysis is a growing field across many domains, including computer vision, social media marketing, transportation networks, and intelligence analysis. The growing use of digital communication devices and platforms, as well as persistent surveillance sensors, has resulted in explosion of the quantity of data and stretched the abilities of current technologies to process this data and draw meaningful conclusions. Current tools either require significant levels of manual intervention (e.g., to prepare the data, to define patterns, or to draw conclusions from data) or are unable to generalize to new data sources and analysis needs. In this paper, we present automated solutions to two major problems in network analysis: (a) finding patterns in the network data that contains high levels of noise and irrelevant information; and (b) learning repetitive patterns and dependencies between entities and attributes. Our modeling framework represents network data using multi-attributed graphs that can encode various discrete and continuous features and relationships between network entities. The pattern search and learning model is based on probabilistic multi-attributed graph matching, and implemented using distributed message passing algorithms. Our algorithms achieved high accuracy rates in learning and finding patterns in the data, are flexible to new domains and data types, and scale to large datasets using the Map-Reduce framework.
Smart Home, The Next Generation: Closing the Gap between Users and Technology
Hwang, Amy (Univeristy of Toronto) | Hoey, Jesse (University of Waterloo)
In this paper we discuss the gap that exists between the caregivers of older adults attempting to age-in-place and sophisticated ”smart-home” systems that can sense the environment and provide assistance when needed. We argue that smart-home systems need to be customizable by end-users, and we present a general-purpose model for cognitive assistive technology that can be adapted to suit many different tasks, users and environments. Al- though we can provide mechanisms for engineers and designers to build and adapt smart-home systems based on this general-purpose model, these mechanisms are not easily understood by or sufficiently user-friendly for actual end users such as older adults and their care- givers. Our goal is therefore to study how to bridge the gap between the end-users and this technology. In this paper, we discuss our work on this problem from both sides: developing technology that is customizable and general-purpose, and studying user’s abilities and needs when it comes to building smart-home systems to help with activities of daily living. We show how a large gap still exists, and propose ideas for how to bridge the gap.
Temporal Autoencoding Restricted Boltzmann Machine
Häusler, Chris, Susemihl, Alex
Much work has been done refining and characterizing the receptive fields learned by deep learning algorithms. A lot of this work has focused on the development of Gabor-like filters learned when enforcing sparsity constraints on a natural image dataset. Little work however has investigated how these filters might expand to the temporal domain, namely through training on natural movies. Here we investigate exactly this problem in established temporal deep learning algorithms as well as a new learning paradigm suggested here, the Temporal Autoencoding Restricted Boltzmann Machine (TARBM).