Goto

Collaborating Authors

 Reinforcement Learning





Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning Gen Li CUHK Wenhao Zhan Princeton Jason D. Lee Princeton Y uejie Chi CMU Y uxin Chen

Neural Information Processing Systems

This paper studies tabular reinforcement learning (RL) in the hybrid setting, which assumes access to both an offline dataset and online interactions with the unknown environment. A central question boils down to how to efficiently utilize online data to strengthen and complement the offline dataset and enable effective policy fine-tuning. Leveraging recent advances in reward-agnostic exploration and of-fline RL, we design a three-stage hybrid RL algorithm that beats the best of both worlds -- pure offline RL and pure online RL -- in terms of sample complexities. The proposed algorithm does not require any reward information during data collection. Our theory is developed based on a new notion called single-policy partial concentrability, which captures the trade-off between distribution mismatch and miscoverage and guides the interplay between offline and online data.




Randomized Exploration for Reinforcement Learning with Multinomial Logistic Function Approximation

Neural Information Processing Systems

Reinforcement learning (RL) is a sequential decision-making problem in which an agent tries to maximize its expected cumulative reward by interacting with an unknown environment over time.