Teachable Reinforcement Learning via Advice Distillation
–Neural Information Processing Systems
"), and what sub-goals to accomplish ("pick up the yellow ball"), offering We then describe an algorithmic framework for learning in CAMDPs via alternating advice grounding and distillation phases. "place the yellow ball in the green box and the blue key in the green box" or "open all doors in In multi-task RL, a learner's objective is produce a policy For instance, the agent in Fig 3 can leverage hints "go left" or "move towards the blue key" to guide In the grounding phase, agents learn how to interpret coaching.
Neural Information Processing Systems
Nov-13-2025, 22:22:53 GMT
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe > Netherlands
- North Brabant > Eindhoven (0.04)
- North America > United States
- Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Middle East
- Genre:
- Research Report (0.69)
- Industry:
- Education (1.00)
- Technology: