Reviews: Variance Reduced Policy Evaluation with Smooth Function Approximation
–Neural Information Processing Systems
Overall, the paper made significant contribution to both the reinforcement learning community and optimization community. The proposed algorithm is a variant of non-convex SAGA algorithm introduced by [1]. The novelty comes from their proof for the non-convex but strongly concave case. There are several issues which should be addressed: 1, Recasting the policy evaluation as a primal-dual optimization via the Fenchel duality technique is not new. In fact, [2,3,4] have already exploit this reformulation. First, these related work should be referred appropriately.
Neural Information Processing Systems
Jan-24-2025, 04:15:50 GMT
- Technology: