Going Beyond Heuristics by Imposing Policy Improvement as a Constraint Chi-Chang Lee

Neural Information Processing Systems 

As such, we prevent policies from merely exploiting heuristic rewards without improving the task reward.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found