REINFORCEjs: Gridworld with Dynamic Programming


Temporal Difference Learning Gridworld Demo // agent parameter spec to play with (this gets eval()'d on Agent reset) var spec {} spec.update This is a toy environment called **Gridworld** that is often used as a toy model in the Reinforcement Learning literature. In this particular case: - **State space**: GridWorld has 10x10 100 distinct states. The start state is the top left cell. The gray cells are walls and cannot be moved to. In this example - **Environment Dynamics**: GridWorld is deterministic, leading to the same new state given each state and action - **Rewards**: The agent receives 1 reward when it is in the center square (the one that shows R 1.0), and -1 reward in a few states (R -1.0 is shown for these).