Teachable Reinforcement Learning via Advice Distillation

Neural Information Processing Systems 

"), and what sub-goals to accomplish ("pick up the yellow ball"), offering We then describe an algorithmic framework for learning in CAMDPs via alternating advice grounding and distillation phases. "place the yellow ball in the green box and the blue key in the green box" or "open all doors in In multi-task RL, a learner's objective is produce a policy For instance, the agent in Fig 3 can leverage hints "go left" or "move towards the blue key" to guide In the grounding phase, agents learn how to interpret coaching.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found