Reinforcement Learning in Low-Rank MDPs with Density Features

Huang, Audrey, Chen, Jinglin, Jiang, Nan

arXiv.org Artificial Intelligence 

The theory of reinforcement learning (RL) in large state spaces has seen fast development. In the model-free regime, how to use powerful function approximation to learn value functions has been extensively studied in both the online and the offline settings (Jiang et al., 2017; Jin et al., 2020b,c; Xie et al., 2021), which also builds the theoretical foundations that connect RL with (discriminative) supervised learning. On the other hand, generative models for unsupervised/self-supervised learning--which define a sampling distribution explicitly or implicitly--are becoming increasingly powerful (Devlin et al., 2018; Goodfellow et al., 2020), yet how to leverage them to address the key challenges in RL remains under-investigated. While prior works on RL with unsupervised-learning oracles exist (Du et al., 2019; Feng et al., 2020), they often consider models such as block MDPs, which are more restrictive than typical model structures considered in the value-based setting such as low-rank MDPs. In this paper, we study model-free RL in low-rank MDPs with density features for state occupancy estimation. In a low-rank MDP, the transition matrix can be factored into the product of two matrices, and the left matrix is known to serve as powerful features for value-based learning (Jin et al., 2020b), as it can be used to approximate the Bellman backup of any function. On the other hand, the right matrix can be used to represent the policies' state-occupancy distributions, yet how to leverage such density features (without the knowledge of the left matrix) in offline or online RL is unknown.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found