Review for NeurIPS paper: Adam with Bandit Sampling for Deep Learning

Neural Information Processing Systems 

Additional Feedback: This work seems to propose an approach for sampling minibatches that can perhaps be applied to other procedures apart from ADAM. Therefore, apart from ADAM, was this approach (or suitable variants) explored (perhaps empirically) for other optimiztions procedures that involve minibatch? It can also be used to produce desired minibatches for better training. How does this approach compare to the state of the art in curriculum learning. In Algorithm 2 Line 2, what is L?