Regularized Behavior Cloning for Blocking the Leakage of Past Action Information

Oct-9-2024, 11:05:04 GMT–Neural Information Processing Systems

For partially observable environments, imitation learning with observation histories (ILOH) assumes that control-relevant information is sufficiently captured in the observation histories for imitating the expert actions. In the offline setting wherethe agent is required to learn to imitate without interaction with the environment, behavior cloning (BC) has been shown to be a simple yet effective method for imitation learning. However, when the information about the actions executed in the past timesteps leaks into the observation histories, ILOH via BC often ends up imitating its own past actions. In this paper, we address this catastrophic failure by proposing a principled regularization for BC, which we name Past Action Leakage Regularization (PALR). The main idea behind our approach is to leverage the classical notion of conditional independence to mitigate the leakage.

artificial intelligence, machine learning, observation history, (7 more...)

Neural Information Processing Systems

Oct-9-2024, 11:05:04 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (0.66)
  - Representation & Reasoning (0.50)