A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning

Open in new window