Goto

Collaborating Authors

 causal confusion problem



Appendix A Source codes

Neural Information Processing Systems

Specifically, we average the scores over 100 episodes evaluated on confounded environments for each random seed. We use Adam optimizer with the learning rate of 3e-4. Note that other regularization baselines are based on BC. In particular, OREO achieves the mean HNS of 114.9%, while Figure 9: We compare OREO to CCIL with environment interaction, on 6 confounded Atari environments. We investigate the possibility of applying OREO to other IL methods.



GABRIL: Gaze-Based Regularization for Mitigating Causal Confusion in Imitation Learning

Banayeeanzade, Amin, Bahrani, Fatemeh, Zhou, Yutai, Bıyık, Erdem

arXiv.org Artificial Intelligence

Imitation Learning (IL) is a widely adopted approach which enables agents to learn from human expert demonstrations by framing the task as a supervised learning problem. However, IL often suffers from causal confusion, where agents misinterpret spurious correlations as causal relationships, leading to poor performance in testing environments with distribution shift. To address this issue, we introduce GAze-Based Regularization in Imitation Learning (GABRIL), a novel method that leverages the human gaze data gathered during the data collection phase to guide the representation learning in IL. GABRIL utilizes a regularization loss which encourages the model to focus on causally relevant features identified through expert gaze and consequently mitigates the effects of confounding variables. We validate our approach in Atari environments and the Bench2Drive benchmark in CARLA by collecting human gaze datasets and applying our method in both domains. Experimental results show that the improvement of GABRIL over behavior cloning is around 179% more than the same number for other baselines in the Atari and 76% in the CARLA setup. Finally, we show that our method provides extra explainability when compared to regular IL agents.


Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning

Park, Jongjin, Seo, Younggyo, Liu, Chang, Zhao, Li, Qin, Tao, Shin, Jinwoo, Liu, Tie-Yan

arXiv.org Artificial Intelligence

Behavioral cloning has proven to be effective for learning sequential decision-making policies from expert demonstrations. However, behavioral cloning often suffers from the causal confusion problem where a policy relies on the noticeable effect of expert actions due to the strong correlation but not the cause we desire. This paper presents Object-aware REgularizatiOn (OREO), a simple technique that regularizes an imitation policy in an object-aware manner. Our main idea is to encourage a policy to uniformly attend to all semantic objects, in order to prevent the policy from exploiting nuisance variables strongly correlated with expert actions. To this end, we introduce a two-stage approach: (a) we extract semantic objects from images by utilizing discrete codes from a vector-quantized variational autoencoder, and (b) we randomly drop the units that share the same discrete code together, i.e., masking out semantic objects. Our experiments demonstrate that OREO significantly improves the performance of behavioral cloning, outperforming various other regularization and causality-based methods on a variety of Atari environments and a self-driving CARLA environment. We also show that our method even outperforms inverse reinforcement learning methods trained with a considerable amount of environment interaction.