Reinforcement Learning: Diverging weights in Predatar-Prey-Environment • /r/MachineLearning
I am self-learning Reinforcement-Learning material, mainly using https://sites.ualberta.ca/ The environment I am testing the algorithms in is pretty simple: 3 predators agents, 1 randomly moving prey, grid world (about 15x15) and they can move up,down,left,right. At the moment I am learning about function approximation.The update quantity I am using is learning_rate * bellman_error * gradient Q(X_t, A_t) as seen on page 59 of the above paper. Equally if I use a linear function or a neural network, my weights diverge very quickly (using SARSA, I didn't try Q-Learning yet but I would be suprised if they wouldn't diverge there). I checked the calculations the algorithm makes by hand and it seems right.
Sep-23-2016, 11:45:12 GMT
- Technology: