An introduction to Policy Gradients with Cartpole and Doom

May-16-2018, 03:06:38 GMT–#artificialintelligence

In the last two articles about Q-learning and Deep Q learning, we worked with value-based reinforcement learning algorithms. To choose which action to take given a state, we take the action with the highest Q-value (maximum expected future reward I will get at each state). As a consequence, in value-based learning, a policy exists only because of these action-value estimates. Today, we'll learn a policy-based reinforcement learning technique called Policy Gradients. The first will learn to keep the bar in balance.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

#artificialintelligence

May-16-2018, 03:06:38 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found