Policy-Gradient Methods for Planning

Apr-6-2023, 15:32:04 GMT–Neural Information Processing Systems

Probabilistic temporal planning attempts to find good policies for acting in domains with concurrent durative tasks, multiple uncertain outcomes, and limited resources. These domains are typically modelled as Markov decision problems and solved using dynamic programming methods. This paper demonstrates the application of reinforcement learning -- in the form of a policy-gradient method -- to these domains. Our emphasis is large domains that are infeasible for dynamic programming. Our ap- proach is to construct simple policies, or agents, for each planning task.

policy-gradient method

Neural Information Processing Systems

Apr-6-2023, 15:32:04 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (0.86)
  - Machine Learning > Reinforcement Learning (0.32)