Gradient Policy on "CartPole" game and its' expansibility to F1Tenth Autonomous Vehicles

Shi, Mingwei

arXiv.org Artificial Intelligence 

Generally, when learners are now studying the knowledge of the reinforcement learning algorithm at the beginning, the algorithm we first came up in learner's mind is the Q-learning algorithm, which is a classical reinforcement learning algorithm based on value iteration. In the state-to-action mapping process, an algorithm based on value iteration allows the system to explore in accordance with the policy guidelines, and update the state value at each step of the exploration. Then, in value-based iteration, we have several problems that cannot prevent that. For example, when the value of each state is updated, it is necessary to estimate the probability of all actions. Unlike the discrete action of walking a maze, some cases such as robot control and automatic driving since the massive state information brought by continuous actions makes the calculation process almost impossible by tabular computation. At this time, Policy Gradient, a reinforcement learning algorithm based on iteration policy, came into being. The policy gradient no longer calculates the reward, but directly calculates the probability of taking an action in a certain state, and directly selects the action through the probability.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found