Goto

Collaborating Authors

 Education








EDGE: Explaining Deep Reinforcement Learning Policies S1 Additional Technical Details

Neural Information Processing Systems

Note that these games are two-player games, we select the runner in You-Shall-Not-Pass and kicker in Kick-And-Defend as our target agent. Section 4 mentioned that we download a well-trained policy for each game.




63dc7ed1010d3c3b8269faf0ba7491d4-Supplemental.pdf

Neural Information Processing Systems

In this document, we provide details and supplementary materials that cannot fit into the main manuscript due to the page limit. The specific form ofcenter distribution isunknown, but we can still train a generatorG to approximate it. IfR(G,D,T)),wechooseλ=0, i.e., no restriction onR(G,D,T)), to obtain the minimal cost. IfR(G,D,T)) >, then a large λshould be applied as apenalization. According to the derivation of Eq. (3), we obtain arelaxed versionoftheintractableEq.(2),expressedasfollows: min Inknowledge distillation, student models arecrafted using unlabeled datasets, where only thesoft targets from teachers are utilized.