A Appendix
–Neural Information Processing Systems
X belongs to A, when X is sampled from distribution ν . We use six games on Atari: AirRaid, Asteroids, Pong, MsPacman, Gopher and UpNDown. The reward functions for these environments are set as follows. The environment settings for maze environments are shown in Table 1. For "maze-multireward" environment, the orange square awards the agent for Environment settings V alues for Maze V alues for Atari Stack size 1 4 Frame skip 1 4 One-frame observation shape (84, 84, 3) (84, 84) Agent's observation shape (84, 84, 3) (84, 84, 4) γ 0.99 0.99 Reward clipping -- true Terminate on Life Loss -- true Sticky Actions false false Table 1: Environment settings for maze and Atari.
Neural Information Processing Systems
Oct-2-2025, 05:21:24 GMT