Review for NeurIPS paper: Differentiable Meta-Learning of Bandit Policies

Neural Information Processing Systems 

The rebuttal helped clarify the questions raised in the review. The consensus reached in the discussion is that this is a borderline-plus paper. The reviewers appreciate the contribution's practicality, relevance and usefulness, and at the same time they do remain concerned about the narrow scope, and would rather have seen the policy-gradient method applied to parameterized policies for more complex learning problems. On the whole, this is a worthwhile addition to the program. The rebuttal did not answer one question successfully, namely regarding the setup in the experiments section, where the learning process operating at two-levels remained confusing.