Appendix A Source codes
–Neural Information Processing Systems
Specifically, we average the scores over 100 episodes evaluated on confounded environments for each random seed. We use Adam optimizer with the learning rate of 3e-4. Note that other regularization baselines are based on BC. In particular, OREO achieves the mean HNS of 114.9%, while Figure 9: We compare OREO to CCIL with environment interaction, on 6 confounded Atari environments. We investigate the possibility of applying OREO to other IL methods.
Neural Information Processing Systems
Oct-2-2025, 14:48:45 GMT