Convergence of Optimistic and Incremental Q-Learning
Even-dar, Eyal, Mansour, Yishay
–Neural Information Processing Systems
The first is the widely used optimistic Q-learning, which initializes the Q-values to large initial values and then follows a greedy policy with respect to the Q-values. We show that setting the initial value sufficiently large guarantees the converges to an E optimal policy. The second is a new and novel algorithm incremental Q-learning, which gradually promotes the values of actions that are not taken. We show that incremental Q-learning converges, in the limit, to the optimal policy. Our incremental Q-learning algorithm can be viewed as derandomization of the E-greedy Q-learning. 1 Introduction One of the challenges of Reinforcement Learning is learning in an unknown environment.
Neural Information Processing Systems
Dec-31-2002
- Country:
- North America > United States
- New York > New York County
- New York City (0.04)
- Massachusetts > Middlesex County
- Belmont (0.04)
- New York > New York County
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East
- Jordan (0.04)
- Israel > Tel Aviv District
- Tel Aviv (0.05)
- North America > United States
- Technology: