A Convergent Form of Approximate Policy Iteration

Dec-31-2003–Neural Information Processing Systems

We study a new, model-free form of approximate policy iteration which uses Sarsa updates with linear state-action value function approximation for policy evaluation, and a "policy improvement operator" to generate a new policy based on the learned state-action values. We prove that if the policy improvement operator produces -soft policies and is Lipschitz continuous in the action values, with a constant that is not too large, then the approximate policy iteration algorithm converges to a unique solution from any initial policy. To our knowledge, this is the first convergence result for any form of approximate policy iteration under similar computational-resource assumptions.

algorithm, artificial intelligence, reinforcement learning, (15 more...)

Neural Information Processing Systems

Dec-31-2003

Conferences PDF

Add feedback

Country:
- North America
  - Canada > Quebec
    - Montreal (0.14)
  - United States > Massachusetts
    - Hampshire County > Amherst (0.14)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.71)
    - Reinforcement Learning (1.00)
  - Representation & Reasoning (0.91)

Duplicate Docs Excel Report

Title
A Convergent Form of Approximate Policy Iteration

Similar Docs Excel Report more

Title	Similarity	Source
None found