Appendices A Reinforcement Learning using Matrix Estimation: The Pseudo Code

Neural Information Processing Systems 

For the mean error, we use the average of the (absolute) difference over this grid.