A Note on Target Q-learning For Solving Finite MDPs with A Generative Oracle

Mar-22-2022–arXiv.org Machine Learning

Q-learning is one of the most simple yet popular algorithms in the reinforcement learning (RL) community [Sutton and Barto, 2018]. However, Q-learning suffers the divergence issue when (linear) function approximation is applied [Baird, 1995, Tsitsiklis and Van Roy, 1997]. To address this instability issue, a technique called target network is proposed in the famous DQN algorithm [Mnih et al., 2015]. In particular, DQN implements a duplication of the main Q-network (i.e., the so-called target network), which is further used to generate the bootstrap signal for updates. One important feature is that the target network is fixed over intervals. Unlike Q-learning, the learning targets do not change during an interval for DQN. In [Mnih et al., 2015, Table 3], it is reported that the target network contributes a lot to the superior performance of DQN.

q-learning, sample complexity, target q-learning, (10 more...)

arXiv.org Machine Learning

Mar-22-2022

arXiv.org PDF

Add feedback

Country:
- Asia > China
  - Jiangsu Province > Nanjing (0.04)
  - Guangdong Province > Shenzhen (0.04)
  - Hong Kong (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.52)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found