Reverse Experience Replay

Oct-22-2019–arXiv.org Artificial Intelligence

The goal of this environment is to drive up on the mountain. However, the car's engine is not strong enough to simply accelerate and scale the mountain. Every frame agent receives -1 reward. Therefore, the dependencies of Q-values are strong. Considering these conditions, the reverse order update is useful here. All results are the average of 3 learning and test iterations. Deep Q-Learning Network with Reverse Experience Replay shows competitive results against Double DQN with Experience Replay and vanilla DQN with Experience Replay (Figure 5). Double DQN achieves the smallest results because of the Target-Network update (some transitions were sampled before Target-Network update, and the old max Q-value was used).Figure 5: Performance of DQN RER, DDQN ER, DQN ER algorithms in the Mountain Car Problem (the mean of the test results of 3 different learning processes from 3 different seeds). Table 1 presents the details of the Mountain Car experiment (NN structure, training and testing hyperparameters).

experience replay, q-value, transition, (15 more...)

arXiv.org Artificial Intelligence

Oct-22-2019

arXiv.org PDF

Add feedback

Country:
- Asia > Russia (0.04)
- North America > United States
  - California > Santa Clara County > Palo Alto (0.04)
- Europe > Russia
  - Central Federal District > Moscow Oblast > Moscow (0.04)

Genre:
- Research Report (0.51)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found