Reinforcement Learning with Parameterized Actions
Masson, Warwick, Ranchod, Pravesh, Konidaris, George
–arXiv.org Artificial Intelligence
We introduce a model-free algorithm for learning in Markov decision processes with parameterized actions--discrete actions with continuous parameters. At each step the agent must select both which action to use and which parameters to use with that action. We introduce the Q-PAMDP algorithm for learning in these domains, show that it converges to a local optimum, and compare it to direct policy search in the goalscoring and Platform domains.
arXiv.org Artificial Intelligence
Nov-26-2015
- Country:
- Africa > South Africa
- Gauteng > Johannesburg (0.04)
- Europe > France
- Occitanie > Haute-Garonne > Toulouse (0.04)
- North America > United States
- Massachusetts > Middlesex County
- Cambridge (0.04)
- North Carolina > Durham County
- Durham (0.04)
- Massachusetts > Middlesex County
- Africa > South Africa
- Genre:
- Research Report (0.40)
- Industry:
- Leisure & Entertainment > Sports > Soccer (0.49)