Non-asymptotic Performances of Robust Markov Decision Processes

Yang, Wenhao, Zhang, Zhihua

arXiv.org Machine Learning 

Markov Decision Processes (MDPs) play key mathematical models in Reinforcement Learning (RL). Despite its success in empirical performances [Haarnoja et al., 2018, Mnih et al., 2015, 2016, Silver et al., 2016], there are also many works providing insightful and solid theoretical understandings towards RL. The difficulty of solving an MDP mainly is due to the reward and transition probability, whose exact information is usually unknown to observers. To deal with the situations, one common approach resorts to offline methods, where the agent only has access to a given explorable dataset generated by given strategies. Many practical deep RL algorithms employ the offline method and achieve state-of-art success [Mnih et al., 2015, Lillicrap et al., 2015, Fujimoto et al., 2019]. In addition to empirical success, there are flourishing works on offline RL from a theoretical perspective. Some prior works [Chen and Jiang, 2019, Agarwal et al., 2020, Duan et al., 2021] have provided solid results on model-free offline methods, while some other works [Sidford et al., 2018, Xie et al., 2019, Yin and Wang, 2020, Yin et al., 2020] consider model-based approaches. However, Mannor et al. [2004] showed that model-based approaches can be quite sensitive to estimation errors by directly estimating the transition probability from an offline dataset.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found