Chain of Thought Imitation with Procedure Cloning
–Neural Information Processing Systems
Imitation learning aims to extract high-performance policies from logged demonstrations of expert behavior. It is common to frame imitation learning as a supervised learning problem in which one fits a function approximator to the input-output mapping exhibited by the logged demonstrations (input observations to output actions). While the framing of imitation learning as a supervised input-output learning problem allows for applicability in a wide variety of settings, it is also an overly simplistic view of the problem in situations where the expert demonstrations provide much richer insight into expert behavior. For example, applications such as path navigation, robot manipulation, and strategy games acquire expert demonstrations via planning, search, or some other multi-step algorithm, revealing not just the output action to be imitated but also the procedure for how to determine this action. While these intermediate computations may use tools not available to the agent during inference (e.g., environment simulators), they are nevertheless informative as a way to explain an expert's mapping of state to actions.
Neural Information Processing Systems
Jan-19-2025, 05:31:53 GMT
- Industry:
- Education > Focused Education > Special Education (0.50)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Robots (1.00)
- Information Technology > Artificial Intelligence