Learning to Plan via a Multi-Step Policy Regression Method

Wagner, Stefan, Janschek, Michael, Uelwer, Tobias, Harmeling, Stefan

Jun-18-2021–arXiv.org Artificial Intelligence

We propose a new approach to increase inference performance in environments that require a specific sequence of actions in order to be solved. This is for example the case for maze environments where ideally an optimal path is determined. Instead of learning a policy for a single step, we want to learn a policy that can predict n actions in advance. Our proposed method called policy horizon regression (PHR) uses knowledge of the environment sampled by A2C to learn an n dimensional policy vector in a policy distillation setup which yields n sequential actions per observation. We test our method on the MiniGrid and Pong environments and show drastic speedup during inference time by successfully predicting sequences of actions on a single observation.

agent, policy vector, teacher policy, (13 more...)

arXiv.org Artificial Intelligence

Jun-18-2021

arXiv.org PDF

Add feedback

Country:
- Europe > Germany > North Rhine-Westphalia > Düsseldorf Region > Düsseldorf (0.04)

Genre:
- Research Report (0.64)
- Workflow (0.48)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (0.94)
  - Statistical Learning > Regression (0.65)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found