Learning Finite-State Controllers for Partially Observable Environments

Meuleau, Nicolas, Peshkin, Leonid, Kim, Kee-Eung, Kaelbling, Leslie Pack

Jan-23-2013–arXiv.org Artificial Intelligence

Reactive (memoryless) policies are sufficient in completely observable Markov decision processes (MDPs), but some kind of memory is usually necessary for optimal control of a partially observable MDP. Policies with finite memory can be represented as finite-state automata. In this paper, we extend Baird and Moore's VAPS algorithm to the problem of learning general finite-state automata. Because it performs stochastic gradient descent, this algorithm can be shown to converge to a locally optimal finite-state controller. We provide the details of the algorithm and then consider the question of under what conditions stochastic gradient descent will outperform exact gradient descent. We conclude with empirical results comparing the performance of stochastic and exact gradient descent, and showing the ability of our algorithm to extract the useful information contained in the sequence of past observations to compensate for the lack of observability at each time-step.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

Jan-23-2013

arXiv.org PDF

Add feedback

Country:
- North America > United States > Massachusetts > Middlesex County (0.14)

Genre:
- Research Report > New Finding (0.68)

Industry:
- Government > Regional Government > North America Government > United States Government (0.61)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning > Gradient Descent (1.00)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found