Action Guidance: Getting the Best of Sparse Rewards and Shaped Rewards for Real-time Strategy Games

Huang, Shengyi, Ontañón, Santiago

arXiv.org Machine Learning 

HRL is especially popular in RTS games with combinatorial action spaces (Pang et al., 2019; Ye et al., 2020). The most closely related work is perhaps Scheduled Auxiliary Control (SAC-X) (Riedmiller et al., 2018), which is an HRL algorithm that trains auxiliary agents to perform primitive actions with shaped rewards and a main agent to schedule the use of auxiliary agents with sparse rewards. However, our approach differs in the treatment of the main agent. Instead of learning to schedule auxiliary agents, our main agent learns to act in the entire action space by taking action guidance from the auxiliary agents. There are two intuitive benefits to our approach since our main agent learns in the full action space. First, during policy evaluation our main agent does not have to commit to a particular auxiliary agent to perform actions for a fixed number of time steps like it is usually done in SAC-X. Second, learning in the full action space means the main agent will less likely suffer from the definition of handcrafted sub-tasks, which could be incomplete or biased.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found