Review for NeurIPS paper: Model-based Adversarial Meta-Reinforcement Learning

Neural Information Processing Systems 

Additional Feedback: After reading the other reviews and the authors' rebuttal, I have increased my score to 7. The additional experiments are greatly appreciated, but I think more details should be provided for them: e.g. I feel that if the policy has all the necessary information and is trained with a model-free approach, it should be able to obtain comparable or better result than a model-based approach (with much worse sample complexity, of course). That being said, the comparison between model-based and model-free methods is not the focus of the work and the experiments with model-based baselines do show good results. I think the paper presents an interesting idea for improving the robustness of model-based rl method to different reward functions. I have a few questions regarding the details of the algorithm, as listed below.