Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach

Zhang, Xuezhou, Song, Yuda, Uehara, Masatoshi, Wang, Mengdi, Agarwal, Alekh, Sun, Wen

arXiv.org Artificial Intelligence 

Representation learning in Reinforcement Learning (RL) has gained increasing attention in recent years from both theoretical and empirical research communities (Schwarzer et al., 2020; Laskin et al., 2020) due to its potential in enabling sample-efficient non-linear function approximation, the benefits in multitask settings (Zhang et al., 2020; Yang et al., 2022; Sodhani et al., 2021), and the potential to leverage advances on representation learning in related areas such as computer vision and natural language processing. Despite this interest, there remains a gap between the theoretical and empirical literature, where the theoretically sound methods are seldom evaluated or even implemented and often rely on strong assumptions, while the empirical techniques are not backed with any theoretical guarantees even under stylistic assumptions. This leaves open the key challenge of designing representation learning methods that are both theoretically sound and empirically effective. In this work, we tackle this challenge for a special class of problems called Block MDPs, where the high dimensional and rich observations of the agent are generated from certain latent states and there exists some fixed, but unknown mapping from observations to the latent states (each observation is generated only by one latent state). Prior works (Dann et al., 2018; Du et al., 2019; Misra et al., 2020; Zhang et al., 2020; Sodhani et al., 2021) have motivated the Block MDP model through scenarios such as navigation tasks and image based robotics tasks where the observations can often be reasonably mapped to the latent physical location and states.