Reviews: VIREL: A Variational Inference Framework for Reinforcement Learning

Neural Information Processing Systems 

This paper brings an novel perspective on probabilistic frameworks for new reinforcement learning algorithms, and the adaptive temperature reweighting may lead to more insightful exploration built into our RL algorithms. The paper is written clearly, and is also well-organized and easy to understand, and the appendix is structured clearly as well, although the full length of the appendix paper makes the paper a little unwieldy to read. The authors have clearly put in a lot of work into developing the theory and presentation in this paper, and although empirically the performance of the derived algorithms do not show significant improvement over max-ent RL methods (with twin Q functions as in TD3), the approach is interesting and I believe this paper would be well-suited for NeurIPS. Some specific comments: - In the definition of the residual error on L147, over what distribution is the L p norm being referred to? - Instead of e_w being a global constant, have the authors considered parametrizing e_w as a function of h - this would allow for state-adaptive uncertainty and exploration, and I believe a majority of the results would still hold. However, most works with the Max-Ent framework parametrize variational distributions through only the action distributions, and fix the variational distribution on dynamics to the actual dynamics model.