8a0e1141fd37fa5b98d5bb769ba1a7cc-AuthorFeedback.pdf

Neural Information Processing Systems 

When they optimize the squared TD-error, they do not calculate gradient w.r.t. the parameterθ of the18 target(e.g.