Efficiently Solving MDPs with Stochastic Mirror Descent

Aug-28-2020–arXiv.org Machine Learning

Markov decision processes (MDPs) are a fundamental mathematical abstraction for sequential decision making under uncertainty and they serve as a basic modeling tool in reinforcement learning (RL) and stochastic control [5, 24, 30]. Two prominent classes of MDPs are average-reward MDPs (AMDPs) and discounted MDPs (DMDPs). Each have been studied extensively; AMDPs are applicable to optimal control, learning automata, and various real-world reinforcement learning settings [17, 3, 22] and DMDPs have a number of nice theoretical properties including reward convergence and operator monotonicity [6]. In this paper we consider the prevalent computational learning problem of finding an approximately optimal policy of an MDP given only restricted access to the model. In particular, we consider the problem of computing an ɛ-optimal policy, i.e. a policy with an additive ɛ error in expected cumulative reward over infinite horizon, under the standard assumption of a generative model [14, 13], which allows one to sample from state-transitions given the current state-action pair. This problem is well-studied and there are multiple known upper and lower bounds on its sample complexity [4, 32, 28, 31]. In this work, we provide a unified framework based on primal-dual stochastic mirror descent (SMD) for learning an ɛ-optimal policies for both AMDPs and DMDPs with a generative model.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Machine Learning

Aug-28-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Massachusetts > Middlesex County
    - Belmont (0.04)
  - California > Santa Clara County
    - Palo Alto (0.04)
- Europe > United Kingdom
  - England > Greater London > London (0.04)

Genre:
- Research Report (0.64)

Industry:
- Education (0.48)
- Information Technology (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (0.69)
  - Machine Learning
    - Reinforcement Learning (0.89)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found