Reverse Experience Replay

Rotinov, Egor

arXiv.org Artificial Intelligence 

The goal of this environment is to drive up on the mountain. However, the car's engine is not strong enough to simply accelerate and scale the mountain. Every frame agent receives -1 reward. Therefore, the dependencies of Q-values are strong. Considering these conditions, the reverse order update is useful here. All results are the average of 3 learning and test iterations. Deep Q-Learning Network with Reverse Experience Replay shows competitive results against Double DQN with Experience Replay and vanilla DQN with Experience Replay (Figure 5). Double DQN achieves the smallest results because of the Target-Network update (some transitions were sampled before Target-Network update, and the old max Q-value was used).Figure 5: Performance of DQN RER, DDQN ER, DQN ER algorithms in the Mountain Car Problem (the mean of the test results of 3 different learning processes from 3 different seeds). Table 1 presents the details of the Mountain Car experiment (NN structure, training and testing hyperparameters).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found