An introduction to Deep Q-Learning: let's play Doom
At each time step, we receive a tuple (state, action, reward, new_state). We learn from it (we feed the tuple in our neural network), and then throw this experience. Our problem is that we give sequential samples from interactions with the environment to our neural network. And it tends to forget the previous experiences as it overwrites with new experiences. For instance, if we are in the first level and then the second (which is totally different), our agent can forget how to behave in the first level.
Apr-16-2018, 22:25:25 GMT
- Technology: