A Tasks description and assumptions used for the different method of reward shaping

Neural Information Processing Systems 

This supplementary material provides additional results and discussion, as well as implementation details. Section A summarises the different tasks and the assumption used in RIDE, EAGER, ELLA. Section B gives more details about training of the QA module and the agent. It also includes explanations of how we built the training data set for the QA module. Section C gathers several results on EAGER: comparison with behavioural cloning, generalisation capacity of QA, robustness results of EAGER... Section D contains a commented version of the EAGER algorithm. Table 1 describes the tasks used in the experiments with an example and if it has been used to train the QA module or the agent.