raisocketai
Searching for Programmatic Policies in Semantic Spaces
Moraes, Rubens O., Lelis, Levi H. S.
Syntax-guided synthesis is commonly used to generate programs encoding policies. In this approach, the set of programs, that can be written in a domain-specific language defines the search space, and an algorithm searches within this space for programs that encode strong policies. In this paper, we propose an alternative method for synthesizing programmatic policies, where we search within an approximation of the language's semantic space. We hypothesized that searching in semantic spaces is more sample-efficient compared to syntax-based spaces. Our rationale is that the search is more efficient if the algorithm evaluates different agent behaviors as it searches through the space, a feature often missing in syntax-based spaces. This is because small changes in the syntax of a program often do not result in different agent behaviors. We define semantic spaces by learning a library of programs that present different agent behaviors. Then, we approximate the semantic space by defining a neighborhood function for local search algorithms, where we replace parts of the current candidate program with programs from the library. We evaluated our hypothesis in a real-time strategy game called MicroRTS. Empirical results support our hypothesis that searching in semantic spaces can be more sample-efficient than searching in syntax-based spaces.
A Competition Winning Deep Reinforcement Learning Agent in microRTS
Scripted agents have predominantly won the five previous iterations of the IEEE microRTS (µRTS) competitions hosted at CIG and CoG. Despite Deep Reinforcement Learning (DRL) algorithms making significant strides in real-time strategy (RTS) games, their adoption in this primarily academic competition has been limited due to the considerable training resources required and the complexity inherent in creating and debugging such agents. In a benchmark without performance constraints, RAISocketAI regularly defeated the two prior competition winners. This first competition-winning DRL submission can be a benchmark for future microRTS competitions and a starting point for future DRL research. Iteratively fine-tuning the base policy and transfer learning to specific maps were critical to RAISocketAI's winning performance. These strategies can be used to economically train future DRL agents. Further work in Imitation Learning using Behavior Cloning and fine-tuning these models with DRL has proven promising as an efficient way to bootstrap models with demonstrated, competitive behaviors. Deep reinforcement learning (DRL) has proven to be powerful at solving complex problems requiring several steps to achieve a goal, such as Atari games (Mnih et al., 2013), continuous control tasks (Lillicrap et al., 2016), and even real-time strategy (RTS) games like StarCraft II (Vinyals et al., 2019). The StarCraft II grandmaster agent AlphaStar was trained with thousands of CPUs and GPUs/TPUs for several weeks. RTS games are particularly challenging for DRL for several reasons: (1) the observation and action spaces are large and varied with different terrain and unit types; (2) each unit type can have different actions and abilities; (3) each action can control several units at once; (4) rewards are sparse (win, loss, or tie) and delayed by possibly several thousand timesteps; (5) winning requires combining tactical (micro) and strategic (macro) decisions; (6) actions must be taken in real-time (i.e., the game won't wait for the agent to take an action); (7) the agent might not have full visibility of the game state (i.e., fog of war); and (8) events in the game might be non-deterministic. It includes many aspects of RTS games, simplified: different unit types, unit-specific actions, terrain, resource collection and utilization to build units, and unit-to-unit combat where units have different strengths and weaknesses. The IEEE microRTS competitions have been hosted at the Conference on Games (CoG) nearly every year since 2019 and at the Conference on Computational Intelligence and Games (CIG) before that since 2017 (Ontañón et al., 2018).