Review for NeurIPS paper: Adversarial Counterfactual Learning and Evaluation for Recommender System

Neural Information Processing Systems 

The authors give a statement that "the recommendation model is optimized over the worst-case exposure mechanism" but fail to give clear motivation behind the model. Why optimizing with the worst-case exposure is better than optimizing with the expected exposure that is widely adopted by existing methods? It seems that the essential advantage of the proposed method is robust. Uncertainty is not a good motivation as it has been considered by existing methods and can not answer the above question. The proposed method should be compared with existing unbiased recommendation methods (e.g. The difference in terms of solutions and generalization bounds between the paper with [a5][a6] should be discussed.