A Natural Policy Gradient

Dec-31-2002–Neural Information Processing Systems

Sham Kakade Gatsby Computational Neuroscience Unit 17 Queen Square, London, UK WC1N 3AR http://www.gatsby.ucl.ac.uk sham@gatsby.ucl.ac.uk Abstract We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space.Although gradient methods cannot make large changes in the values of the parameters, we show that the natural gradient ismoving toward choosing a greedy optimal action rather than just a better action. These greedy optimal actions are those that would be chosen under one improvement step of policy iteration with approximate, compatible value functions, as defined by Sutton etal. We then show drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris. 1 Introduction There has been a growing interest in direct policy-gradient methods for approximate planning in large Markov decision problems (MDPs). Unfortunately, the standard gradient descent rule is noncovariant. In this paper, we present a covariant gradient by defining a metric based on the underlying structure of the policy.

artificial intelligence, gradient, machine learning, (16 more...)

Neural Information Processing Systems

Dec-31-2002

Conferences PDF

Add feedback

Country:
- Europe > United Kingdom > England > Greater London > London (0.24)

Industry:
- Health & Medicine > Therapeutic Area > Neurology (0.54)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Duplicate Docs Excel Report

Title
A Natural Policy Gradient
A Natural Policy Gradient

Similar Docs Excel Report more

Title	Similarity	Source
None found