Reward-Mixing MDPs with a Few Latent Contexts are Learnable

Kwon, Jeongyeol, Efroni, Yonathan, Caramanis, Constantine, Mannor, Shie

Oct-5-2022–arXiv.org Artificial Intelligence

Reinforcement learning (RL) in partially observable syste ms is a challenging problem. While partially observable Markov decision process (POMDP) is a versatile fra mework, POMDPs are generally hard to learn, primarily because the optimal policy depends on the entire h istory of the process [ 40, 28 ]. Due to its fundamental hardness, it is important to consider sub-classes of POMDPs that allow tractable solutions for a variety of applications. We are interested in a special and p revalent sub-class of POMDPs where the latent (unobservable) parts of the system remain static in each epi sode. Specifically, we consider the framework of Latent MDPs (LMDP s), which has been studied in a few several works ( e.g., [ 8, 5, 22, 41, 30 ]). In LMDPs, one MDP is randomly chosen from M possible candidate models at the beginning of every episode, and an agent intera cts with the chosen MDP for H time steps of an episode. However, the identity of the chosen MDP is unknown t o the agent, which we call the latent contexts . To learn near-optimal policies with latent contexts, exist ing POMDP solutions would require strong assumptions on reachability of the system ( e.g., [ 2, 21 ]) or certain separability assumptions ( e.g., see conditions Most work is done while the author is at The University of Texa s at Austin.

artificial intelligence, machine learning, probability, (18 more...)

arXiv.org Artificial Intelligence

Oct-5-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Wisconsin (0.04)
  - Texas (0.04)
  - Massachusetts > Hampshire County
    - Amherst (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report (0.81)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found