Gradient Policy on "CartPole" game and its' expansibility to F1Tenth Autonomous Vehicles

Mar-15-2021–arXiv.org Artificial Intelligence

Generally, when learners are now studying the knowledge of the reinforcement learning algorithm at the beginning, the algorithm we first came up in learner's mind is the Q-learning algorithm, which is a classical reinforcement learning algorithm based on value iteration. In the state-to-action mapping process, an algorithm based on value iteration allows the system to explore in accordance with the policy guidelines, and update the state value at each step of the exploration. Then, in value-based iteration, we have several problems that cannot prevent that. For example, when the value of each state is updated, it is necessary to estimate the probability of all actions. Unlike the discrete action of walking a maze, some cases such as robot control and automatic driving since the massive state information brought by continuous actions makes the calculation process almost impossible by tabular computation. At this time, Policy Gradient, a reinforcement learning algorithm based on iteration policy, came into being. The policy gradient no longer calculates the reward, but directly calculates the probability of taking an action in a certain state, and directly selects the action through the probability.

algorithm, gradient, vehicle, (15 more...)

arXiv.org Artificial Intelligence

Mar-15-2021

arXiv.org PDF

Add feedback

Country:
- South America > Brazil
  - Rio de Janeiro > Rio de Janeiro (0.04)
- Europe
  - Sweden > Vaestra Goetaland
    - Gothenburg (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)

Genre:
- Research Report (0.40)

Industry:
- Automobiles & Trucks (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found