Reviews: Variance Reduced Policy Evaluation with Smooth Function Approximation
–Neural Information Processing Systems
The main contribution of this paper is in solving the finite-sum minimax problem arising from off-line policy evaluation with nonlinear function approximation. The minimax problem is non-convex in the primal variable and strong convexity in the dual subproblem, and a single time-scale algorithm is proposed to find an approximate stationary point. Although it does not address the full stochastic TD learning problem, the progress in the finite-sum off-line version is quite meaningful.
Neural Information Processing Systems
Jan-24-2025, 04:15:39 GMT