Variance Adjusted Actor Critic Algorithms

Oct-14-2013–arXiv.org Machine Learning

In Reinforcement Learning (RL; [1]) and planning in Markov Decision Processes (MDPs; [2]), the typical objective is to maximize the cumulative (possibly discounted) expected reward, denoted by J. When the model's parameters are known, several well-established and efficient optimization algorithms are known. When the model parameters are not known, learning is needed and there are several algorithmic frameworks that solve the learning problem effectively, at least when the model is finite. Among these, actor-critic methods [3] are known to be particularly efficient. In typical actor-critic algorithms, the critic maintains an estimate of the value function - the expected reward-to-go. This function is then used by the actor to estimate the gradient of the objective with respect to some policy parameters, and then improve the policy by modifying the parameters in the direction of the gradient. The theory that underlies actor-critic algorithms is the policy gradient theorem [4], which relates the value function with the policy gradient.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

Oct-14-2013

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (1.00)
  - Machine Learning
    - Reinforcement Learning (0.89)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.49)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found