When Errors Can Be Beneficial: A Categorization of Imperfect Rewards for Policy Gradient

Open in new window