Reviews: Neural Proximal Gradient Descent for Compressive Imaging

Neural Information Processing Systems 

While my concerns were given significant attention in the rebuttal, I feel they were not fully addressed. In particular, regarding comparison with deep ADMM-net and LDAMP, the authors argue that these methods need more training data/training time. However, training time is normally not a big issue (you only train your model once, does it matter if it takes 2 hours or 10?). The *size* of the training data is however important, but no experiments are provided to show superior performance of the proposed method with respect to the the size of training data. This is surprising given that in l. 62. the authors say they use "much less training data" (addressing the challenge of "scarcity of training data" mentioned in l.4 in abstract), without referring back to this claimed contribution anywhere in the paper!