Action Guidance: Getting the Best of Sparse Rewards and Shaped Rewards for Real-time Strategy Games

Oct-4-2020–arXiv.org Machine Learning

HRL is especially popular in RTS games with combinatorial action spaces (Pang et al., 2019; Ye et al., 2020). The most closely related work is perhaps Scheduled Auxiliary Control (SAC-X) (Riedmiller et al., 2018), which is an HRL algorithm that trains auxiliary agents to perform primitive actions with shaped rewards and a main agent to schedule the use of auxiliary agents with sparse rewards. However, our approach differs in the treatment of the main agent. Instead of learning to schedule auxiliary agents, our main agent learns to act in the entire action space by taking action guidance from the auxiliary agents. There are two intuitive benefits to our approach since our main agent learns in the full action space. First, during policy evaluation our main agent does not have to commit to a particular auxiliary agent to perform actions for a fixed number of time steps like it is usually done in SAC-X. Second, learning in the full action space means the main agent will less likely suffer from the definition of handcrafted sub-tasks, which could be incomplete or biased.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

Oct-4-2020

arXiv.org PDF

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States
  - Pennsylvania > Philadelphia County > Philadelphia (0.04)

Genre:
- Research Report (0.64)

Industry:
- Leisure & Entertainment > Games > Computer Games (0.64)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Neural Networks (0.95)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found