End-to-End Policy Gradient Method for POMDPs and Explainable Agents
Nishimori, Soichiro, Koyamada, Sotetsu, Ishii, Shin
–arXiv.org Artificial Intelligence
Real-world decision-making problems are often partially observable, and many can be formulated as a Partially Observable Markov Decision Process (POMDP). When we apply reinforcement learning (RL) algorithms to the POMDP, reasonable estimation of the hidden states can help solve the problems. Furthermore, explainable decision-making is preferable, considering their application to real-world tasks such as autonomous driving cars. We proposed an RL algorithm that estimates the hidden states by end-to-end training, and visualize the estimation as a state-transition graph. Experimental results demonstrated that the proposed algorithm can solve simple POMDP problems and that the visualization makes the agent's behavior interpretable to humans.
arXiv.org Artificial Intelligence
Apr-19-2023
- Country:
- North America > United States
- New York > New York County > New York City (0.04)
- Asia
- Middle East > Jordan (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture
- Tokyo (0.16)
- Kansai > Kyoto Prefecture
- Kyoto (0.07)
- Kantō > Tokyo Metropolis Prefecture
- North America > United States
- Genre:
- Research Report (0.64)
- Industry:
- Information Technology > Robotics & Automation (0.34)
- Automobiles & Trucks (0.34)
- Transportation > Ground
- Road (0.34)
- Technology: