On the Complexity of Policy Iteration

Jan-23-2013–arXiv.org Artificial Intelligence

Decision-making problems in uncertain or stochastic domains are often formulated as Markov decision processes (MD Ps). Policy iteration (PI) is a popular algorithm for searching over policy-space, the size of which is exponential in the number of states. We are interested in bounds on the complexity of PI that do not depend on the value of the discount factor. In this paper we prove the first such nontrivial, worst-case, upper bounds on the number of iterations required by PI to converge to the optimal policy. Our analysis also sheds new light on the manner in which PI progresses through the space of policies.

artificial intelligence, iteration, machine learning, (19 more...)

arXiv.org Artificial Intelligence

Jan-23-2013

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.93)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found