Review for NeurIPS paper: Autoregressive Score Matching

Neural Information Processing Systems 

Weaknesses: EXPERIMENTS * Section 5.1, Figure 2 - Given that the vertical axes are on completely different scales, it's unclear what this comparison shows. Even though the loss curve for CSM plateaus more quickly than DSM, that doesn't necessarily imply that the trained model achieves better density estimation performance. The paper claims "less shifted colors" when using CSM, but there doesn't seem to be a noticeable difference difference between MLE and CSM for CelebA. So without any comparison to other methods nor quantitative metric (such as PSNR), the denoising results only seems to serve as a quick sanity check. A more thorough experiment would be necessary to demonstrate the the models trained under CSM are "sufficiently expressive to capture complex distributions and solve difficult tasks." * Section 6 - The NLL and FID improvements are very small.