Policy Networks vs Value Networks in Reinforcement Learning
In Reinforcement Learning, the agents take random decisions in their environment and learns on selecting the right one out of many to achieve their goal and play at a super-human level. Policy and Value Networks are used together in algorithms like Monte Carlo Tree Search to perform Reinforcement Learning. Both the networks are an integral part of a method called Exploration in MCTS algorithm. They are also known as policy iteration & value iteration since they are calculated many times making it an iterative process. Let's understand why are they so important in Machine Learning and what's the difference between them?
Aug-5-2018, 11:06:15 GMT
- Technology: