Randomized Policy Learning for Continuous State and Action MDPs

Jun-7-2020–arXiv.org Artificial Intelligence

Recently, for continuous control tasks, reinforcement learning (RL) algorithms based on actor-critic architecture [9] or policy optimization [16] have shown remarkably good performance. The policy and the value function are represented by deep neural networks and then the weights are updated accordingly. However, [7] shows that the performance of these RL algorithms vary a lot with changes in hyperparameters, network architecture etc. Furthermore, [10] showed that a simple linear policy-based method with weights updated by a random search method can outperform some of these state-of-the-art results. A key question is how far we can go by relying almost exclusively on these architectural biases. For Markov decision processes (MDPs) with discrete state and action spaces, model-based algorithms based on dynamic programming (DP) ideas [13] can be used when the model is known. Unfortunately, in many problems (e.g., robotics), the system model is unknown, or simply too complicated to be succinctly stated and used in DP algorithms. Usually, latter is the more likely case.

algorithm, deep learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

Jun-7-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.49)
    - Neural Networks > Deep Learning (0.34)
    - Reinforcement Learning (1.00)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found