Value function estimation in Markov reward processes: Instance-dependent $\ell_\infty$-bounds for policy evaluation

Pananjady, Ashwin, Wainwright, Martin J.

Sep-18-2019–arXiv.org Machine Learning

A variety of applications spanning science and engineering use Markov reward processes as models for real-world phenomena, including queueing systems, transportation networks, robotic exploration, game playing, and epidemiology. In some of these settings, the underlying parameters that govern the process are known to the modeller, but in others, these must be estimated from observed data. A salient example of the latter setting, which forms the main motivation for this paper, is the policy evaluation problem encountered in Markov decision processes (MDPs) and reinforcement learning [Ber95a; Ber95b; SB18]. Here an agent operates in an environment whose dynamics are unknown: at each step, it observes the current state of the environment, and takes an action that changes its state according to some stochastic transition function determined by the environment. The goal is to evaluate the utility of some policy--that is, a mapping from states to actions, where utility is measured using rewards that the agent receives from the environment. These rewards are usually assumed to be additive over time, and since the policy determines the action to be taken at each state, the reward obtained at any time is simply a function of the current state of the agent. Thus, this setting induces a Markov reward process (MRP) on the state space, in which both the underlying transitions and rewards are unknown to the agent. The agent only observes samples of state transitions and rewards. 1

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

Sep-18-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.46)

Genre:
- Research Report (0.82)

Industry:
- Transportation (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Statistical Learning (0.82)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.66)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found