InformationDirectedRewardLearning forReinforcementLearning
–Neural Information Processing Systems
From such expensive feedback, we aim to learn a model of the reward that allows standard RL algorithms to achieve high expected returnswith as few expert queries as possible.
Neural Information Processing Systems
Feb-7-2026, 19:03:16 GMT
- Technology: