Question about experience replay in deep q learning • /r/MachineLearning
I am not sure did I understand it correctly. In each state, we update the score of chosen action to be the best Q-Value of the next state and keep the score of other action to be unchanged. Moreover, we put state, updated scores of all actions into memory. We sample N pairs in the memory (needed to be in the same game??) and train them altogether. So we only calculate the new score of the transition that we just take and reuse the calculated scores of previous transition stored in memory?
Apr-10-2016, 10:05:06 GMT
- Technology: