Reinforcement Learning by Value Gradients
–arXiv.org Artificial Intelligence
The concept of the value-gradient is introduced and developed in the context of reinforcement learning, for deterministic episodic control problems that use a function approximator and have a continuous state space. It is shown that by learning the valuegradients, instead of just the values themselves, exploration or stochastic behaviour is no longer needed to find locally optimal trajectories. This is the main motivation for using value-gradients, and it is argued that learning the value-gradients is the actual objective of any value-function learning algorithm for control problems. It is also argued that learning value-gradients is significantly more efficient than learning just the values, and this argument is supported in experiments by efficiency gains of several orders of magnitude, in several problem domains. Once value-gradients are introduced into learning, several analyses become possible. For example, a surprising equivalence between a value-gradient learning algorithm and a policy-gradient learning algorithm is proven, and this provides a robust convergence proof for control problems using a value function with a general function approximator. Also, the issue of whether to include'residual gradient' terms into the weight update equations is addressed. Finally, an analysis is made of actor-critic architectures, which finds strong similarities to back-propagation through time, and gives simplifications and convergence proofs to certain actor-critic architectures, but while making those actor-critic architectures redundant. Unfortunately, by proving equivalence to policy-gradient learning, finding new divergence examples even in the absence of bootstrapping, and proving the redundancy of residual-gradients and actor-critic architectures in some circumstances, this paper does somewhat discredit the usefulness of using a value-function.
arXiv.org Artificial Intelligence
Mar-25-2008
- Country:
- North America > United States
- New York > New York County
- New York City (0.04)
- Florida > Orange County
- Orlando (0.04)
- California
- San Francisco County > San Francisco (0.14)
- San Mateo County > San Mateo (0.04)
- New York > New York County
- Europe
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Oxfordshire > Oxford (0.04)
- Greater London > London (0.04)
- France > Auvergne-Rhône-Alpes
- United Kingdom > England
- North America > United States
- Genre:
- Research Report (0.63)
- Technology: