How hard is my MDP?" The distribution-norm to the rescue"

Maillard, Odalric-Ambrym, Mann, Timothy A., Mannor, Shie

Feb-14-2020, 08:42:42 GMT–Neural Information Processing Systems

In Reinforcement Learning (RL), state-of-the-art algorithms require a large number of samples per state-action pair to estimate the transition kernel $p$. In many problems, a good approximation of $p$ is not needed. For instance, if from one state-action pair $(s,a)$, one can only transit to states with the same value, learning $p(\cdot s,a)$ accurately is irrelevant (only its support matters). This paper aims at capturing such behavior by defining a novel hardness measure for Markov Decision Processes (MDPs) we call the {\em distribution-norm}. The distribution-norm w.r.t. a measure $ u$ is defined on zero $ u$-mean functions $f$ by the standard variation of $f$ with respect to $ u$.

concentration inequality, hardness measure, state-action pair, (3 more...)

Neural Information Processing Systems

Feb-14-2020, 08:42:42 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.42)