A Note on Target Q-learning For Solving Finite MDPs with A Generative Oracle
Q-learning is one of the most simple yet popular algorithms in the reinforcement learning (RL) community [Sutton and Barto, 2018]. However, Q-learning suffers the divergence issue when (linear) function approximation is applied [Baird, 1995, Tsitsiklis and Van Roy, 1997]. To address this instability issue, a technique called target network is proposed in the famous DQN algorithm [Mnih et al., 2015]. In particular, DQN implements a duplication of the main Q-network (i.e., the so-called target network), which is further used to generate the bootstrap signal for updates. One important feature is that the target network is fixed over intervals. Unlike Q-learning, the learning targets do not change during an interval for DQN. In [Mnih et al., 2015, Table 3], it is reported that the target network contributes a lot to the superior performance of DQN.
Mar-22-2022
- Country:
- Asia > China
- Jiangsu Province > Nanjing (0.04)
- Guangdong Province > Shenzhen (0.04)
- Hong Kong (0.04)
- Asia > China
- Genre:
- Research Report (0.64)