Reinforcement Learning in Rich-Observation MDPs using Spectral Methods

Azizzadenesheli, Kamyar, Lazaric, Alessandro, Anandkumar, Animashree

Jun-18-2017–arXiv.org Artificial Intelligence

Designing effective exploration-exploitation algorithms in Markov decision processes (MDPs) with large state-action spaces is the main challenge in reinforcement learning (RL). In fact, the learning performance degrades with the number of states and actions in the MDP. However, MDPs often exhibit a low-dimensional latent structure in practice, where a small hidden state is observable through a possibly large number of observations. In this paper, we study the setting of rich-observation Markov decision processes (\richmdp), where hidden states are mapped to observations through an injective mapping, so that an observation can be generated by only one hidden state. While this mapping is unknown a priori, we introduce a spectral decomposition method that consistently estimates how observations are clustered in the hidden states. The estimated clustering is then integrated into an optimistic algorithm for RL (UCRL), which operates on the smaller clustered space. The resulting algorithm proceeds through phases and we show that its per-step regret (i.e., the difference in cumulative reward between the algorithm and the optimal policy) decreases as more observations are clustered together and finally, matches the (ideal) performance of an RL algorithm running directly on the hidden MDP.

algorithm, artificial intelligence, upstream oil & gas, (18 more...)

arXiv.org Artificial Intelligence

Jun-18-2017

arXiv.org PDF

Add feedback

Country:
- Europe > France
  - Hauts-de-France (0.14)
- North America > United States
  - California (0.14)

Industry:
- Energy > Oil & Gas > Upstream (0.48)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Learning Graphical Models > Undirected Networks
    - Markov Models (1.00)
  - Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found