Reviews: QMDP-Net: Deep Learning for Planning under Partial Observability
–Neural Information Processing Systems
The paper proposes a novel policy network architecture for partially observable environments. The network includes a filtering module and a planning module. The filtering module mimics computation of the current belief of the agent given its previous belief, the last action and the last observation. The model of the environment and the observation function are replaced with trainable neural modules. The planning module runs value iteration for an MDP, whose transition function and reward function are also trainable neural modules.
Neural Information Processing Systems
Oct-8-2024, 12:10:46 GMT
- Technology: