A Supplemental Details

Neural Information Processing Systems 

Here, we formally define all intrinsic rewards evaluated in the paper. All algorithms use the same network architectures. Observations and changes counts are based on egocentric and panoramic views, respectively. MiniGrid, egocentric views are 147-dimensional while panoramic views are 588-dimensional. Gradients are clipped to have maximum norm 40.