Review for NeurIPS paper: Further Analysis of Outlier Detection with Deep Generative Models

Neural Information Processing Systems 

Summary and Contributions: ----- Update ----- I have read the author response as well as the other reviews. I agree with some of the concerns raised by the other reviewers, but also do not find them to be significant to question the overall value and insights in this paper. I would still vote for accept, but lower my score to 7. ------------------ This work further analyzes the recently observed issue that Deep Generative Models (DGMs) regularly assign higher likelihood to out-of-distribution (OOD) samples/outliers. Based on the phenomenon that typical sets (regions of largest probability mass where samples likely fall into) must not coincide with density level sets (high-density/likelihood regions) in high dimensions, a novel white noise test for outlier detection is proposed. This test shows a marked improvement in detection performance over previous tests *using the same models* on common benchmarks (CIFAR-10, SVHN, CelebA, TinyImagNet in-/out-of-distribution combinations), thereby suggesting that DGMs are not necessarily uncalibrated, but rather that existing likelihood-based test might be improperly formulated/applied.