Review for NeurIPS paper: Denoising Diffusion Probabilistic Models

Neural Information Processing Systems 

However, the empirical performance of the proposed approach shows huge advantage over NCSN. Can the author elaborate what makes this difference? To my knowledge, the difference are The number of noise-levels (denoted as L): For the diffusion model, L 1000. The scheduling sequence of variance (denoted as beta_t, which is the \sigma 2 in NCSN): For the diffusion model, beta_1 1e-4, beta_T 0.02, and linear schedule is employed. For NCSN, they consider the geometric sequence, and beta_T is much larger for NCSNv2.