Review for NeurIPS paper: An Unbiased Risk Estimator for Learning with Augmented Classes

Neural Information Processing Systems 

How to use the non-negative risk estimator in this problem? In particular, where to add the max-operator in the risk estimator? I think it is important to clarify this part. I am aware that the choice of loss is identical to Kiryo et al. My question is have you tried different loss functions? For the analysis of infinite-sample consistency in Theorem 1, loss function choice is quite restrictive and does not cover many losses such as the sigmoid loss.