Review for NeurIPS paper: Differentiable Meta-Learning of Bandit Policies