regularized approximate value iteration scheme
Review for NeurIPS paper: On the Convergence of Smooth Regularized Approximate Value Iteration Schemes
Correctness: - The claims are correct for the most part, excepting some questions I had about the neural network function approximation section. As this claim doesn't seem to be major, I am willing to weight it less and put the paper at an accept for now. I don't completely follow the argument given, since the use of limiting approximations doesn't seem to allow the use of any inequalities in lines 482-483. This could just be my relative unfamiliarity with NTK. - What is "overwhelming probability"? Where does the u_j go?
Review for NeurIPS paper: On the Convergence of Smooth Regularized Approximate Value Iteration Schemes
This analysis provides theoretical insights explaining their empirical success. After author feedback and discussion all reviewers agree that this is a meaningful contribution to the better understanding of existing RL algorithms. This is thus a clear « Accept » decision. That being said, I would like to ask the authors to please add a discussion w.r.t.
On the Convergence of Smooth Regularized Approximate Value Iteration Schemes
Entropy regularization, smoothing of Q-values and neural network function approximator are key components of the state-of-the-art reinforcement learning (RL) algorithms, such as Soft Actor-Critic \cite{haarnoja2018soft}. Despite the widespread use, the impact of these core techniques on the convergence of RL algorithms is not yet fully understood. In particular, our analysis shows that (1) value smoothing results in increased stability of the algorithm in exchange for slower convergence, (2) entropy regularization reduces overestimation errors at the cost of modifying the original problem, (3) we study a combination of these techniques that describes the Soft Actor-Critic algorithm.