Reinforcement Learning
Teaching Inverse Reinforcement Learners via Features and Demonstrations
Luis Haug, Sebastian Tschiatschek, Adish Singla
Weintroduceanaturalquantity,the teaching risk, which measures the potential suboptimality of policies that look optimal to the learner in this setting. We show that bounds on the teaching risk guarantee that the learner is able to find a near-optimal policy using standard algorithms basedoninversereinforcement learning. Basedonthesefindings, we suggest a teaching scheme in which the expert can decrease the teaching risk by updating the learner's worldview, and thus ultimately enable her to find a near-optimalpolicy.
eda9523faa5e7191aee1c2eaff669716-Supplemental-Conference.pdf
Though promising results have been reported on some RL application domains, policies learned with such representations usually fail to generalize well in a complex environment because minimizing a reconstruction loss may potentially introduce local (visual) features with task-irrelevant information.
eda9523faa5e7191aee1c2eaff669716-Paper-Conference.pdf
Though promising results have been reported on some RL application domains, policies learned with such representations usually fail to generalize well in a complex environment because minimizing a reconstruction loss may potentially introduce local (visual) features with task-irrelevant information.