End-to-End Policy Gradient Method for POMDPs and Explainable Agents