A Reduction from Apprenticeship Learning to Classification

Feb-15-2020, 03:41:50 GMT–Neural Information Processing Systems

We provide new theoretical results for apprenticeship learning, a variant of reinforcement learning in which the true reward function is unknown, and the goal is to perform well relative to an observed expert. We study a common approach to learning from expert demonstrations: using a classification algorithm to learn to imitate the expert's behavior. Although this straightforward learning strategy is widely-used in practice, it has been subject to very little formal analysis. We prove that, if the learned classifier has error rate $\eps$, the difference between the value of the apprentice's policy and the expert's policy is $O(\sqrt{\eps})$. Further, we prove that this difference is only $O(\eps)$ when the expert's policy is close to optimal.

apprenticeship learning, classification, demonstration, (1 more...)

Neural Information Processing Systems

Feb-15-2020, 03:41:50 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.99)