Review for NeurIPS paper: Robust Reinforcement Learning via Adversarial training with Langevin Dynamics

Neural Information Processing Systems 

The paper focuses on robust RL in an adversarial setting in which two policies are simultaneously optimized and one is used to perturb the other. In this setting the paper proposes a parameter sampling approach and provides theoretical and empirical evidence why parameter sampling is more effective than deterministic optimization. The reviewers agree that the paper is well motivated and addresses a topic that is relevant to the community and some reviewers appreciate the combination of simple conceptual examples and larger scale empirical evaluation. Although the majority of the reviewers look favorably at the paper they highlight a number of areas for improvement, especially in the presentation (e.g. a more explicit discussion of contributions and prior work, R1; a more self-contained presentation and discussion of the significance of concepts such as Nash equilibrium; R2); the experimental evaluation (presentation and analysis of results, R1/R2/R3; use of TD3 instead of DDPG results in the main text, all reviewers); limited novelty compared to [12] (R1); limitations of the scope of the theoretical results and of the problem formulation (e.g. The authors have provided responses to the main criticisms and the paper and the response was extensively discussed by the reviewers.