Review for NeurIPS paper: Distributionally Robust Parametric Maximum Likelihood Estimation

Neural Information Processing Systems 

Since everything is parametric, I'd expect explicit rates of convergence involvind all probalem complexity parameters (n, m, p, etc.) To make the rest of my points clear, let me recall the following notations are used in the paper: - n: the dimensionality of the covariate (i.e feature vector) X. Thus X is random vector in R n. BTW, in the context of ML or stats, I'd use another notation here, as n conventionally stands for "sample size".