Why temporal difference (TD) method has lower variance than Monte Carlo method?

#artificialintelligence 

This question might be a little trivial. However, I had a hard time understanding it or finding some formal proof for it. In many papers, it is being said that for estimating the value function, one of the advantages of using temporal difference methods over the Monte Carlo methods in reinforcement learning is that they have a lower variance for computing value function. Up to now, I was not able to find any formal proof for this. Moreover, it is also being said that the Monte Carlo method is less biased when compared with TD methods. If somebody can help me better understand this phenomenon, I would appreciate.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found