Introduction to Various Reinforcement Learning Algorithms. Part II (TRPO, PPO)
Advantage is a term that is commonly used in numerous advanced RL algorithms, such as A3C, NAF, and the algorithms that I am going to discuss (perhaps I will write another blog post for these two algorithms). To view it in a more intuitive manner, think of it as how good an action is compared to the average action for a specific state. But why do we need advantage? I will use an example posted in this forum to illustrate the idea of advantage. Have you ever played a game called "Catch"? In the game, fruits will be dropping down from the top of the screen.
Jan-18-2018, 19:47:02 GMT
- Technology: