Review for NeurIPS paper: Flows for simultaneous manifold learning and density estimation

Neural Information Processing Systems 

In lines 245-248 the authors discuss a fair comparison between the different methods and mention their effort to keep the total number of coupling layers the same between several methods the same. Can the authors please also comment on the difference in the number of parameters? As the coupling layers in M-Flows don't always act on data of the same dimensionality as regular AF flows, the number of parameters can be different, even with the same number of coupling layers. For the celebA dataset, have you tried to train M-Flows with different n then 512? 4. Can you explain in the main text on a high level why including the SCANDAL loss consistently leads to a larger closure for all methods (lower closure is better). In general, since the supplementary material contains so much more material, it would help the reader if you refer more frequently to the relevant parts of the supplementary material in the main text.