Tinkering with Monte Carlo Method in Reinforcement Learning
Monte Carlo, as well as Dynamic Programming, Temporal Difference are the main methods for starters in Reinforcement Learning. First, let's have a brief reminder of what is Monte Carlo method. Monte Carlo is an algorithm that generates paths (which constitutes an episode) based on the current policy which usually splits between exploration and exploitation, like epsilon greedy, until the path reaches a terminal state. Once that state is reached, the algorithm goes back through that path again and affects each state the discounted rewards that are met during the episode. These values (discounts rewards) are averaged with any other values that happen to be contained in those states.
Dec-18-2021, 09:45:58 GMT
- Technology: