Reinforcement Learning with Parameterized Actions
Masson, Warwick, Ranchod, Pravesh, Konidaris, George
–arXiv.org Artificial Intelligence
We introduce a model-free algorithm for learning in Markov decision processes with parameterized actions-discrete actions with continuous parameters. At each step the agent must select both which action to use and which parameters to use with that action. We introduce the Q-PAMDP algorithm for learning in these domains, show that it converges to a local optimum, and compare it to direct policy search in the goal-scoring and Platform domains.
arXiv.org Artificial Intelligence
Nov-26-2015
- Country:
- Africa (0.28)
- North America > United States
- North Carolina (0.14)
- Industry:
- Leisure & Entertainment > Sports > Soccer (0.69)