Reinforcement Learning with Parameterized Actions
Masson, Warwick, Ranchod, Pravesh, Konidaris, George
–arXiv.org Artificial Intelligence
We introduce a model-free algorithm for learning in Markov decision processes with parameterized actions--discrete actions with continuous parameters. At each step the agent must select both which action to use and which parameters to use with that action. We introduce the Q-PAMDP algorithm for learning in these domains, show that it converges to a local optimum, and compare it to direct policy search in the goalscoring and Platform domains.
arXiv.org Artificial Intelligence
Nov-26-2015
- Country:
- North America > United States (0.46)
- Africa (0.28)
- Genre:
- Research Report (0.40)
- Industry:
- Leisure & Entertainment > Sports > Soccer (0.49)