Review for NeurIPS paper: Generalization Bound of Gradient Descent for Non-Convex Metric Learning

Neural Information Processing Systems 

The SMILE algorithm is basically the Nadaraya-Watson estimator (with the Gaussian kernel with the Mahalanobis metric instead of the Euclidean metric) where the support vectors are also learnt instead of using training points as support vectors. It is not clear how much advantage does the SMILE classifier get by learning the representative instances. I suspect that they may account for a lot of the advantage SMILE has over other algorithms, since the SMILE classifier is the very simple Nadaraya-Watson estimator, as pointed out above. Moreover, the other competitor algorithms were not offered a chance to similarly learn nice prototypes. One way to rebut this criticism would be to run SMILE but restrict it to using a subset of training points or else award all other methods e.g. It would have been nice if some contemporary applications with VAE or deep metric learning could have been explored.