A Appendix

Neural Information Processing Systems 

A.1 Tabular experiments A.1.1 Implementation Details For our experiments of the FourRooms and FourRoomsTraps domains we based our implementation on [Bacon et al., 2016] and ran the experiments for 300 episodes that last a maximum of 1000 steps. As these are tabular domains, each state is defined by a single feature for both the actor and the critic. Full hyperparameters are listed here: Hyperparameter Value Actor lr 1e-1 Critic lr 1e-1 Discount 0.99 Max Steps 1000 Temperature 1e-1 GCN: hidden size 64 GCN: α 0.6 GCN: η 1e1 Table 2: Hyperparameters for the FourRooms and FourRoomsTraps domain In Fig we notice a close ressemblance in the final output. In this particular environment, performing the messages obtained the forward-backward algorithm are therefore well approximated by the propagation mechanism of the GCN. This also results in very similar empirical performance.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found