Markov Models
Schema Learning: Experience-Based Construction of Predictive Action Models
Holmes, Michael P., Jr., Charles
Schema learning is a way to discover probabilistic, constructivist, predictive actionmodels (schemas) from experience. It includes methods for finding and using hidden state to make predictions more accurate. Weextend the original schema mechanism [1] to handle arbitrary discrete-valued sensors, improve the original learning criteria to handle POMDP domains, and better maintain hidden state by using schema predictions. Theseextensions show large improvement over the original schema mechanism in several rewardless POMDPs, and achieve very low prediction error in a difficult speech modeling task. Further, we compare extended schema learning to the recently introduced predictive state representations [2],and find their predictions of next-step action effects to be approximately equal in accuracy. This work lays the foundation for a schema-based system of integrated learning and planning.
Sharing Clusters among Related Groups: Hierarchical Dirichlet Processes
Teh, Yee W., Jordan, Michael I., Beal, Matthew J., Blei, David M.
We propose the hierarchical Dirichlet process (HDP), a nonparametric Bayesian model for clustering problems involving multiple groups of data. Each group of data is modeled with a mixture, with the number of components being open-ended and inferred automatically by the model. Further, components can be shared across groups, allowing dependencies across groups to be modeled effectively as well as conferring generalization tonew groups. Such grouped clustering problems occur often in practice, e.g. in the problem of topic discovery in document corpora. We report experimental results on three text corpora showing the effective and superior performance of the HDP over previous models.
Resolving Perceptual Aliasing In The Presence Of Noisy Sensors
Agents learning to act in a partially observable domain may need to overcome the problem of perceptual aliasing - i.e., different states that appear similar but require different responses. This problem is exacerbated whenthe agent's sensors are noisy, i.e., sensors may produce different observationsin the same state. We show that many well-known reinforcement learning methods designed to deal with perceptual aliasing, suchas Utile Suffix Memory, finite size history windows, eligibility traces, and memory bits, do not handle noisy sensors well. We suggest a new algorithm, Noisy Utile Suffix Memory (NUSM), based on USM, that uses a weighted classification of observed trajectories. We compare NUSM to the above methods and show it to be more robust to noise.
Coarticulation in Markov Decision Processes
Rohanimanesh, Khashayar, Platt, Robert, Mahadevan, Sridhar, Grupen, Roderic
We investigate an approach for simultaneously committing to multiple activities,each modeled as a temporally extended action in a semi-Markov decision process (SMDP). For each activity we define aset of admissible solutions consisting of the redundant set of optimal policies, and those policies that ascend the optimal statevalue functionassociated with them. A plan is then generated by merging them in such a way that the solutions to the subordinate activities are realized in the set of admissible solutions satisfying the superior activities.
VDCBPI: an Approximate Scalable Algorithm for Large POMDPs
Poupart, Pascal, Boutilier, Craig
Existing algorithms for discrete partially observable Markov decision processes can at best solve problems of a few thousand states due to two important sources of intractability: the curse of dimensionality and the policy space complexity. This paper describes a new algorithm (VDCBPI) that mitigates both sources of intractability by combining the V alue Directed Compression (VDC) technique [13] with Bounded Policy Iteration (BPI) [14]. The scalability of VDCBPI is demonstrated on synthetic network management problems with up to 33 million states. 1 Introduction Partially observable Markov decision processes (POMDPs) provide a natural and expressive framework for decision making, but their use in practice has been limited by the lack of scalable solution algorithms. T wo important sources of intractability plague discrete model-based POMDPs: high dimensionality of belief space, and the complexity of policy or value function (VF) space. Classic solution algorithms [4, 10, 7], for example, compute value functions represented by exponentially many value vectors, each of exponential size. As a result, they can only solve POMDPs with on the order of 100 states.
Approximately Efficient Online Mechanism Design
Parkes, David C., Yanovsky, Dimah, Singh, Satinder P.
Online mechanism design (OMD) addresses the problem of sequential decision making in a stochastic environment with multiple self-interested agents. The goal in OMD is to make value-maximizing decisions despite this self-interest. In previous work we presented a Markov decision process (MDP)-basedapproach to OMD in large-scale problem domains. In practice the underlying MDP needed to solve OMD is too large and hence the mechanism must consider approximations. This raises the possibility thatagents may be able to exploit the approximation for selfish gain. We adopt sparse-sampling-based MDP algorithms to implement ษ- efficient policies, and retain truth-revelation as an approximate Bayesian-Nash equilibrium. Our approach is empirically illustrated in the context of the dynamic allocation of WiFi connectivity to users in a coffeehouse.
Conditional Models of Identity Uncertainty with Application to Noun Coreference
McCallum, Andrew, Wellner, Ben
Coreference analysis, also known as record linkage or identity uncertainty, isa difficult and important problem in natural language processing, databases, citation matching and many other tasks. This paper introduces severaldiscriminative, conditional-probability models for coreference analysis,all examples of undirected graphical models. Unlike many historical approaches to coreference, the models presented here are relational--they do not assume that pairwise coreference decisions should be made independently from each other. Unlike other relational models of coreference that are generative, the conditional model here can incorporate a great variety of features of the input without having to be concerned about their dependencies--paralleling the advantages of conditional randomfields over hidden Markov models.
Planning for Markov Decision Processes with Sparse Stochasticity
Likhachev, Maxim, Thrun, Sebastian, Gordon, Geoffrey J.
Planning algorithms designed for deterministic worlds, such as A* search, usually run much faster than algorithms designed for worlds with uncertain action outcomes, such as value iteration. Real-world planning problems often exhibit uncertainty, which forces us to use the slower algorithms to solve them. Many real-world planning problems exhibit sparse uncertainty: there are long sequences of deterministic actions which accomplish tasks like moving sensor platforms into place, interspersed witha small number of sensing actions which have uncertain outcomes. In this paper we describe a new planning algorithm, called MCP (short for MDP Compression Planning), which combines A* search with value iteration for solving Stochastic Shortest Path problem in MDPs with sparse stochasticity. We present experiments which show that MCP can run substantially faster than competing planners in domains with sparse uncertainty; these experiments are based on a simulation of a ground robot cooperating with a helicopter to fill in a partial map and move to a goal location.
Integrating Topics and Syntax
Griffiths, Thomas L., Steyvers, Mark, Blei, David M., Tenenbaum, Joshua B.
Statistical approaches to language learning typically focus on either short-range syntactic dependencies or long-range semantic dependencies between words. We present a generative model that uses both kinds of dependencies, and can be used to simultaneously find syntactic classes and semantic topics despite having no representation of syntax or semantics beyondstatistical dependency. This model is competitive on tasks like part-of-speech tagging and document classification with models that exclusively use short-and long-range dependencies respectively.