The Value Iteration Algorithm is Not Strongly Polynomial for Discounted Dynamic Programming

Dec-19-2013–arXiv.org Artificial Intelligence

This note provides a simple example demonstrating that, if exact computations are allowed, the number of iterations required for the value iteration algorithm to find an optimal policy for discounted dynamic programming problems may grow arbitrarily quickly with the size of the problem. In particular, the number of iterations can be exponential in the number of actions. Thus, unlike policy iterations, the value iteration algorithm is not strongly polynomial for discounted dynamic programming. Keywords: Markov Decision Process, value iteration, strongly polynomial, policy, algorithm 1. Introduction Value iterations, policy iterations, and linear programming are three major methods for computing optimal policies for Markov Decision Processes (MDPs) with expected total discounted rewards [1], [3, Chapter 6], also known under the name of discounted dynamic programming. As is well-known, policy iterations can be viewed as implementations of the simplex method applied to one of the two major linear programs used to solve MDPs; see e.g.

artificial intelligence, iteration, optimization problem, (11 more...)

arXiv.org Artificial Intelligence

Dec-19-2013

arXiv.org PDF

Add feedback

Country:
- North America > United States > New York (0.16)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found