On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning

Xu, Yifan, Hansen, Nicklas, Wang, Zirui, Chan, Yung-Chieh, Su, Hao, Tu, Zhuowen

arXiv.org Artificial Intelligence 

Reinforcement Learning (RL) algorithms can solve challenging control problems directly from image observations, but they often require millions of environment interactions to do so. Recently, model-based RL algorithms have greatly improved sample-efficiency by concurrently learning an internal model of the world, and supplementing real environment interactions with imagined rollouts for policy improvement. However, learning an effective model of the world from scratch is challenging, and in stark contrast to humans that rely heavily on world understanding and visual cues for learning new skills. In this work, we investigate whether internal models learned by modern model-based RL algorithms can be leveraged to solve new, distinctly different tasks faster. We propose Model-Based Cross-Task Transfer (XTRA), a framework for sample-efficient online RL with scalable pretraining and finetuning of learned world models. By offline multi-task pretraining and online cross-task finetuning, we achieve substantial improvements over a baseline trained from scratch; we improve mean performance of model-based algorithm EfficientZero by 23%, and by as much as 71% in some instances. Learning Environment (ALE; (Bellemare et al., 2013)) has This task suite has Figure 1. Most recently, EfficientZero Ye et al. (2021), a model-based RL algorithm, has demonstrated impressive sample-efficiency, surpassing human-level performance with as little as 2 hours of real-time game play in select Atari 2600 games from the ALE. This achievement is attributed, in part, to the algorithm concurrently learning an internal model of the environment from interaction, and using the learned model to imagine (simulate) further interactions for planning and policy improvement, thus reducing reliance on real environment interactions for skill acquisition. Model-Based Cross-Task Transfer (XTRA): a sample-efficient online RL framework with scalable pretraining and finetuning of learned world models using auxiliary data from offline tasks. Conversely, humans rely heavily on prior knowledge and visual cues when learning new skills - a study found that human players easily identify visual cues about game mechanics when exposed to a new game, and that human performance is severely degraded if such cues are removed or conflict with prior experiences (Dubey et al., 2018). This pretraining paradigm has recently been extended to visuo-motor control in various forms, e.g., by leveraging frozen (no finetuning) pretrained representations (Xiao et al., 2022; Parisi et al., 2022) or by finetuning in a supervised setting (Reed et al., 2022; Lee et al., 2022).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found