End-to-End Policy Gradient Method for POMDPs and Explainable Agents
Nishimori, Soichiro, Koyamada, Sotetsu, Ishii, Shin
–arXiv.org Artificial Intelligence
Real-world decision-making problems are often partially observable, and many can be formulated as a Partially Observable Markov Decision Process (POMDP). When we apply reinforcement learning (RL) algorithms to the POMDP, reasonable estimation of the hidden states can help solve the problems. Furthermore, explainable decision-making is preferable, considering their application to real-world tasks such as autonomous driving cars. We proposed an RL algorithm that estimates the hidden states by end-to-end training, and visualize the estimation as a state-transition graph. Experimental results demonstrated that the proposed algorithm can solve simple POMDP problems and that the visualization makes the agent's behavior interpretable to humans.
arXiv.org Artificial Intelligence
Apr-19-2023
- Genre:
- Research Report (0.64)
- Industry:
- Automobiles & Trucks (0.34)
- Information Technology > Robotics & Automation (0.34)
- Transportation > Ground
- Road (0.34)
- Technology: