Visual Encoders for Data-Efficient Imitation Learning in Modern Video Games
Schäfer, Lukas, Jones, Logan, Kanervisto, Anssi, Cao, Yuhan, Rashid, Tabish, Georgescu, Raluca, Bignell, Dave, Sen, Siddhartha, Gavito, Andrea Treviño, Devlin, Sam
–arXiv.org Artificial Intelligence
Video games have served as useful benchmarks for the decision making community, but going beyond Atari games towards training agents in modern games has been prohibitively expensive for the vast majority of the research community. Recent progress in the research, development and open release of large vision models has the potential to amortize some of these costs across the community. However, it is currently unclear which of these models have learnt representations that retain information critical for sequential decision making. Towards enabling wider participation in the research of gameplaying agents in modern games, we present a systematic study of imitation learning with publicly available visual encoders compared to the typical, task-specific, end-to-end training approach in Minecraft, Minecraft Dungeons and Counter-Strike: Global Offensive. Figure 1: Representative screenshots of all games studied in this paper. However, video games do not only serve as benchmarks but also represent a vast entertainment industry where AI agents may eventually have applications in games development, including game testing or game design (Jacob et al., 2020; Gillberg et al., 2023). In the past, video game research often necessitated close integration with the games themselves to obtain game-specific information and establish a scalable interface for training agents. Work was conducted during an internship at Microsoft Research. To eliminate integration costs during training, we use behavior cloning to train agents entirely offline, utilising previously collected human gameplay data. Although prior research has explored encoding images into lower-dimensional representations for behavior cloning, these studies primarily targeted robotics applications (Nair et al., 2022), where images often resemble real-world scenes. Inspired by the challenges and potential applications in video games, we investigate the following research question: How can images be encoded for data-efficient imitation learning in modern video games? Towards our guiding research question, we compare both end-to-end trained visual encoders and pre-trained visual encoders in three modern video games: Minecraft, Minecraft Dungeons and Counter-Strike: Global Offensive (CS:GO).
arXiv.org Artificial Intelligence
Dec-4-2023
- Genre:
- Research Report
- Experimental Study > Negative Result (0.68)
- New Finding (1.00)
- Research Report
- Industry:
- Leisure & Entertainment > Games > Computer Games (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Games (1.00)
- Machine Learning
- Neural Networks > Deep Learning (0.96)
- Reinforcement Learning (1.00)
- Robots (1.00)
- Information Technology > Artificial Intelligence