Reviews: Log-normality and Skewness of Estimated State/Action Values in Reinforcement Learning

Neural Information Processing Systems 

This paper focuses on the problem arising from skewness in the distribution of value estimates, which may result in over- or under-estimation. With careful analysis, the paper shows that a particular model-based value estimate is approximately log-normally distributed, which is skewed and thus leading to the possibility of over- or under-estimation. It is further shown that positive and negative rewards induce opposite sort of skewness. With simple experiments, the problem of over/underestimation is illustrated. This is an interesting paper with some interesting insights on over/underestimation of values.