An introduction to Policy Gradients with Cartpole and Doom

#artificialintelligence 

In the last two articles about Q-learning and Deep Q learning, we worked with value-based reinforcement learning algorithms. To choose which action to take given a state, we take the action with the highest Q-value (maximum expected future reward I will get at each state). As a consequence, in value-based learning, a policy exists only because of these action-value estimates. Today, we'll learn a policy-based reinforcement learning technique called Policy Gradients. The first will learn to keep the bar in balance.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found