End-to-End Policy Gradient Method for POMDPs and Explainable Agents

Nishimori, Soichiro, Koyamada, Sotetsu, Ishii, Shin

Apr-19-2023–arXiv.org Artificial Intelligence

Real-world decision-making problems are often partially observable, and many can be formulated as a Partially Observable Markov Decision Process (POMDP). When we apply reinforcement learning (RL) algorithms to the POMDP, reasonable estimation of the hidden states can help solve the problems. Furthermore, explainable decision-making is preferable, considering their application to real-world tasks such as autonomous driving cars. We proposed an RL algorithm that estimates the hidden states by end-to-end training, and visualize the estimation as a state-transition graph. Experimental results demonstrated that the proposed algorithm can solve simple POMDP problems and that the visualization makes the agent's behavior interpretable to humans.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

Apr-19-2023

arXiv.org PDF

Add feedback

Country:
- Asia > Japan > Honshū (0.18)

Genre:
- Research Report (0.64)

Industry:
- Automobiles & Trucks (0.34)
- Information Technology > Robotics & Automation (0.34)
- Transportation > Ground
  - Road (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found