On learning history based policies for controlling Markov decision processes

Patil, Gandharv, Mahajan, Aditya, Precup, Doina

Nov-5-2022–arXiv.org Artificial Intelligence

State abstraction and function approximation are vital components used by reinforcement learning (RL) algorithms to efficiently solve complex control problems when exact computations are intractable due to large state and action spaces. Over the past few decades, state abstraction in RL has evolved from the use of pre-determined and problemspecific features [18, 74, 9, 69, 64, 42, 58] to the use of adaptive basis functions learnt by solving an isolated regression problem [53, 47, 39, 56], and more recently to the use of neural network-based Deep-RL algorithms that embed state abstraction in successive layers of a neural network [5, 7]. Feature abstraction results in information loss, and the resulting state features might not satisfy the controlled Markov property, even if this property is satisfied by the corresponding state [70]. One approach to counteract the loss of the Markov property is to generate the features using the history of state-action pairs, and empirical evidence suggests that using such history-based features are beneficial in practice [52]. However, a theoretical characterisation of history-based Deep-RL algorithms for fully observed Markov Decision Processes (MDPs) is largely absent form the literature.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

Nov-5-2022

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - New South Wales > Sydney (0.04)
  - Australian Capital Territory > Canberra (0.04)
- North America
  - United States
    - Virginia > Arlington County
      - Arlington (0.04)
    - Pennsylvania > Allegheny County
      - Pittsburgh (0.04)
    - New York > New York County
      - New York City (0.04)
    - Massachusetts > Middlesex County
      - Cambridge (0.04)
    - Colorado > Denver County
      - Denver (0.14)
    - California
      - San Francisco County > San Francisco (0.14)
      - Los Angeles County > Long Beach (0.04)
      - San Diego County > San Diego (0.04)
  - Puerto Rico > San Juan
    - San Juan (0.04)
  - Mexico > Quintana Roo
    - Cancún (0.04)
  - Canada
    - Quebec > Montreal (0.14)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
- Europe
  - Germany > Berlin (0.04)
  - France (0.04)
  - Austria (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Sweden > Stockholm
    - Stockholm (0.04)
  - Slovenia > Upper Carniola
    - Municipality of Bled > Bled (0.04)
  - Netherlands > South Holland
    - Dordrecht (0.04)
- Asia > India
  - Telangana > Hyderabad (0.04)
- Africa > Ethiopia
  - Addis Ababa > Addis Ababa (0.04)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Neural Networks (1.00)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.85)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found