Review for NeurIPS paper: A Universal Approximation Theorem of Deep Neural Networks for Expressing Probability Distributions

Neural Information Processing Systems 

Weaknesses: I see this paper as a positive, but I have the following unclear points. Is it possible to describe the number of weights needed for the network for the approximation? Some important approximation capability papers investigates a relation btw a number of their weights and the approximation power. How does this affect them? Is it possible to give a similar rate if there is no density in the distribution?