Zap Q-Learning

Dec-31-2017–Neural Information Processing Systems

The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins' original algorithm and recent competitors in several respects. It is a matrix-gain algorithm designed so that its asymptotic variance is optimal. Moreover, an ODE analysis suggests that the transient behavior is a close match to a deterministic Newton-Raphson implementation. This is made possible by a two time-scale update equation for the matrix gain sequence. The analysis suggests that the approach will lead to stable and efficient computation even for non-ideal parameterized settings. Numerical experiments confirm the quick convergence, even in such non-ideal cases.

algorithm, artificial intelligence, reinforcement learning, (16 more...)

Neural Information Processing Systems

Dec-31-2017

Conferences PDF

Add feedback

Country:
- North America > United States
  - Florida > Alachua County
    - Gainesville (0.14)
  - New York (0.14)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
Zap Q-Learning
Zap Q-Learning

Similar Docs Excel Report more

Title	Similarity	Source
None found