Robbins-Mobro conditions for persistent exploration learning strategies

Aug-1-2018–arXiv.org Machine Learning

We formulate simple assumptions, implying the Robbins-Monro conditions for the $Q$-learning algorithm with the local learning rate, depending on the number of visits of a particular state-action pair (local clock) and the number of iteration (global clock). It is assumed that the Markov decision process is communicating and the learning policy ensures the persistent exploration. The restrictions are imposed on the functional dependence of the learning rate on the local and global clocks. The result partially confirms the conjecture of Bradkte (1994).

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

Aug-1-2018

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.04)
  - Massachusetts
    - Hampshire County > Amherst (0.14)
    - Suffolk County > Boston (0.04)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Russia > Southern Federal District
    - Rostov Oblast > Rostov-on-Don (0.04)
  - Germany > Hesse
    - Darmstadt Region > Wiesbaden (0.04)
- Asia
  - Russia (0.04)
  - Middle East > Jordan (0.04)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.51)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found