Online Policy Optimization for Robust MDP

Dong, Jing, Li, Jingwei, Wang, Baoxiang, Zhang, Jingzhao

Sep-28-2022–arXiv.org Artificial Intelligence

The rapid progress of reinforcement learning (RL) algorithms enables trained agents to navigate around complicated environments and solve complex tasks. The standard reinforcement learning methods, however, may fail catastrophically in another environment, even if the two environments only differ slightly in dynamics [Farebrother et al., 2018, Packer et al., 2018, Cobbe et al., 2019, Song et al., 2019, Raileanu and Fergus, 2021]. In practical applications, such mismatch of environment dynamics are common and can be caused by a number of reasons, e.g., model deviation due to incomplete data, unexpected perturbation and possible adversarial attacks. Part of the sensitivity of standard RL algorithms stems from the formulation of the underlying Markov decision process (MDP). In a sequence of interactions, MDP assumes the dynamic to be unchanged, and the trained agent to be tested on the same dynamic thereafter. To model the potential mismatch between system dynamics, the framework of robust MDP is introduced to account for the uncertainty of the parameters of the MDP [Satia and Lave Jr, 1973, White III and Eldeib, 1994, Nilim and El Ghaoui, 2005, Iyengar, 2005].

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

Sep-28-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Washington > King County > Seattle (0.04)
- Asia > China
  - Hong Kong (0.04)
  - Guangdong Province > Shenzhen (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found