Goto

Collaborating Authors

 make good point


Reviews: Occam's razor is insufficient to infer the preferences of irrational agents

Neural Information Processing Systems

Summary: The paper addresses the inverse reinforcement learning problem and the ambiguity that exists in that ill-posed problem. The authors claim that one cannot learn only a reward to explain human behavior but should learn both the reward and the planner at the same time. In that case, they show that many couple (planner, reward) can explain the observed human behavior (or preferences) including a planner that optimizes the reward that is exactly the opposite of the true reward. First, they provide a bound for the worst case regret of a policy. Second they show that rewards that are compatible with the expert policy and have the lower complexity can be very far away from the actual reward optimized by the expert policy.