Appendix A Source codes

Neural Information Processing Systems 

Specifically, we average the scores over 100 episodes evaluated on confounded environments for each random seed. We use Adam optimizer with the learning rate of 3e-4. Note that other regularization baselines are based on BC. In particular, OREO achieves the mean HNS of 114.9%, while Figure 9: We compare OREO to CCIL with environment interaction, on 6 confounded Atari environments. We investigate the possibility of applying OREO to other IL methods.

Duplicate Docs Excel Report

Similar Docs  Excel Report  more

TitleSimilaritySource
None found