Non-asymptotic Performances of Robust Markov Decision Processes

May-9-2021–arXiv.org Machine Learning

Markov Decision Processes (MDPs) play key mathematical models in Reinforcement Learning (RL). Despite its success in empirical performances [Haarnoja et al., 2018, Mnih et al., 2015, 2016, Silver et al., 2016], there are also many works providing insightful and solid theoretical understandings towards RL. The difficulty of solving an MDP mainly is due to the reward and transition probability, whose exact information is usually unknown to observers. To deal with the situations, one common approach resorts to offline methods, where the agent only has access to a given explorable dataset generated by given strategies. Many practical deep RL algorithms employ the offline method and achieve state-of-art success [Mnih et al., 2015, Lillicrap et al., 2015, Fujimoto et al., 2019]. In addition to empirical success, there are flourishing works on offline RL from a theoretical perspective. Some prior works [Chen and Jiang, 2019, Agarwal et al., 2020, Duan et al., 2021] have provided solid results on model-free offline methods, while some other works [Sidford et al., 2018, Xie et al., 2019, Yin and Wang, 2020, Yin et al., 2020] consider model-based approaches. However, Mannor et al. [2004] showed that model-based approaches can be quite sensitive to estimation errors by directly estimating the transition probability from an offline dataset.

arxiv preprint arxiv, assumption, probability 1, (10 more...)

arXiv.org Machine Learning

May-9-2021

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (0.93)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.90)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found