Reverse Experience Replay
–arXiv.org Artificial Intelligence
The goal of this environment is to drive up on the mountain. However, the car's engine is not strong enough to simply accelerate and scale the mountain. Every frame agent receives -1 reward. Therefore, the dependencies of Q-values are strong. Considering these conditions, the reverse order update is useful here. All results are the average of 3 learning and test iterations. Deep Q-Learning Network with Reverse Experience Replay shows competitive results against Double DQN with Experience Replay and vanilla DQN with Experience Replay (Figure 5). Double DQN achieves the smallest results because of the Target-Network update (some transitions were sampled before Target-Network update, and the old max Q-value was used).Figure 5: Performance of DQN RER, DDQN ER, DQN ER algorithms in the Mountain Car Problem (the mean of the test results of 3 different learning processes from 3 different seeds). Table 1 presents the details of the Mountain Car experiment (NN structure, training and testing hyperparameters).
arXiv.org Artificial Intelligence
Oct-22-2019
- Country:
- Asia > Russia (0.04)
- North America > United States
- California > Santa Clara County > Palo Alto (0.04)
- Europe > Russia
- Central Federal District > Moscow Oblast > Moscow (0.04)
- Genre:
- Research Report (0.51)
- Technology: