Reviews: A Block Coordinate Ascent Algorithm for Mean-Variance Optimization

Neural Information Processing Systems 

This work is deals with risk measure in RL/Planning, specifically, risk measures that are based on trajectory variance. Unlike the straight forward approach that is taken in previous works, such as Tamar et al. 2012, 2014 or 2; Prashanth and Ghavamzadeh, 2013, in this work multiple time scale stochastic approximation (MTSSA) is not used. The authors argue that MTSSA is hard to tune and has a slow convergence rate. Instead, the authors propose an approach which is use Block Coordinate Descent. This approach is based on coordinate descent where during the optimization process, not all the coordinates of the policy parameter are optimized, but only a subset of them.