Contextual Bandits and Imitation Learning with Preference-Based Active Queries

Neural Information Processing Systems 

We consider the problem of contextual bandits and imitation learning, where the learner lacks direct knowledge of the executed action's reward.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found