Convergence Results For Q-Learning With Experience Replay

Dec-8-2021–arXiv.org Artificial Intelligence

Q-learning is a well-known and commonly used algorithm for reinforcement learning. In recent years, a technique referred to as experience replay [9, 11] has been suggested as a mechanism to improve Q-learning by allowing the learner to access previous experiences, and use them offline as if they were examples currently sampled from the world. It has been suggested that using past experiences in such a way might allow Q-learning to better converge to the optimal Q values, by breaking the time and space correlation structure of experiences as they are sampled from the real world, allowing for policy updates not dependent on the current time and location in the markov decision process. Moreover, using experience replay improves the efficiency of data usage, since every experience is used for learning more than once. This can be useful in situations where data acquirement is costly or difficult.

experience replay, iteration, state-action pair, (14 more...)

arXiv.org Artificial Intelligence

Dec-8-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.34)