InformationDirectedRewardLearning forReinforcementLearning

Neural Information Processing Systems 

From such expensive feedback, we aim to learn a model of the reward that allows standard RL algorithms to achieve high expected returnswith as few expert queries as possible.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found