On-Line Policy Iteration for Infinite Horizon Dynamic Programming

Jun-1-2021–arXiv.org Artificial Intelligence

Dimitri Bertsekas† Abstract In this paper we propose an on-line policy iteration (PI) algorithm for finite-state infinite horizon discounted dynamic programming, whereby the policy improvement operation is done on-line, only for the states that are encountered during operation of the system. This allows the continuous updating/improvement of the current policy, thus resulting in a form of on-line PI that incorporates the improved controls into the current policy as new states and controls are generated. The algorithm converges in a finite number of stages to a type of locally optimal policy, and suggests the possibility of variants of PI and multiagent PI where the policy improvement is simplified. Moreover, the algorithm can be used with on-line replanning, and is also well-suited for on-line PI algorithms with value and policy approximations. The common characteristic of these variants is that, in addition to being suitable for on-line implementation, they are simplified in two ways: (a) They perform policy improvement operations only for the states that are encountered during the on-line operation of the system.

algorithm, artificial intelligence, optimization problem, (15 more...)

arXiv.org Artificial Intelligence

Jun-1-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States > Massachusetts > Middlesex County (0.15)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.62)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found