Improved and Generalized Upper Bounds on the Complexity of Policy Iteration

Mar-13-2024, 16:30:41 GMT–Neural Information Processing Systems

Given a Markov Decision Process (MDP) with n states and m actions per state, we study the number of iterations needed by Policy Iteration (PI) algorithms to converge to the optimal "-discounted optimal policy. We consider two variations of PI: Howard's PI that changes the actions in all states with a positive advantage, and Simplex-PI that only changes the action in the state with maximal Ïadvantage. 1 We 2Ìshow that 1 Howard's 1 PI 22 terminates

howard, iteration, simplex-pi, (15 more...)

Neural Information Processing Systems

Mar-13-2024, 16:30:41 GMT

Conferences PDF

Add feedback

Country:
- Europe > France (0.04)
- North America > United States
  - New York (0.04)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (0.94)
  - Machine Learning > Learning Graphical Models
    - Undirected Networks > Markov Models (0.35)