Convergence of Optimistic and Incremental Q-Learning

Dec-31-2002–Neural Information Processing Systems

The first is the widely used optimistic Q-learning, which initializes the Q-values to large initial values and then follows a greedy policy with respect to the Q-values. We show that setting the initial value sufficiently large guarantees the converges to an E optimal policy. The second is a new and novel algorithm incremental Q-learning, which gradually promotes the values of actions that are not taken. We show that incremental Q-learning converges, in the limit, to the optimal policy. Our incremental Q-learning algorithm can be viewed as derandomization of the E-greedy Q-learning. 1 Introduction One of the challenges of Reinforcement Learning is learning in an unknown environment.

converge, q-iearning, q-learning, (15 more...)

Neural Information Processing Systems

Dec-31-2002

Conferences PDF

Add feedback

Country:
- North America > United States
  - New York > New York County
    - New York City (0.04)
  - Massachusetts > Middlesex County
    - Belmont (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East
  - Jordan (0.04)
  - Israel > Tel Aviv District
    - Tel Aviv (0.05)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
Convergence of Optimistic and Incremental Q-Learning
Convergence of Optimistic and Incremental Q-Learning

Similar Docs Excel Report more

Title	Similarity	Source
None found