Learning Invariances for Policy Generalization
Combes, Remi Tachet des, Bachman, Philip, van Seijen, Harm
–arXiv.org Artificial Intelligence
The grey rectangle starts on the left of the screen and can be moved with two actions, "Right" and "Jump". The goal of this game is to reach the right of the screen while avoiding the white obstacle. There is only one specific distance (measured in number of pixels) to the obstacle where the agent has to chose the action "Jump" in order to pass over the obstacle. If jumping is chosen at any other point, the agent will inevitably crash into the obstacle. A reward of 1 is granted anytime the agent moves one pixel to the right (even in the air). The episode terminates if the agent reaches the right of the screen or touches the obstacle. We build a set of related tasks by varying two factors: the floor height and the position of the obstacle on the floor. The resulting set contains 1271 tasks. We use 6 of those for training and evaluate the generalization performance as the fraction of the remaining 1265 tasks the agent can solve.
arXiv.org Artificial Intelligence
Sep-7-2018
- Country:
- North America
- United States > California
- San Diego County > San Diego (0.04)
- Canada > Quebec
- Montreal (0.05)
- United States > California
- Asia > Middle East
- Jordan (0.04)
- North America
- Genre:
- Research Report (1.00)
- Industry:
- Leisure & Entertainment > Games (1.00)
- Technology: