mean and standard deviation
Appendix ASource codes
Source codes for reproducing our experimental results are available at https://github.com/ We utilize DQNReplay dataset5 [1] for expert demonstrations on 27 Atari environments [5]. To encourage the size of the dataset to be consistent across multiple environments, we use the number of expert demonstrations N 2{ 20,50}. We provide the size of a dataset for each environment in Table 4. We process input images to grayscale images of 84 84 1, by utilizing Dopamine library6 [9].
Supplementary Material A Data Modeling
In this section, we provide further details for our data modeling. We note the difficulties of appropriately modeling the terminal variable which is a binary variable compared to the rest of the dimensions which are continuous for the environments we investigate. This is particularly challenging for "expert" datasets where early termination is rare. An immediate advantage of sampling data from a generative model is compression. As we discuss in Appendix B.3, sampling is fast ER provides high levels of dataset compression without sacrificing downstream performance in offline reinforcement learning.