presentation and fix all minor issues in the final version. distributions within the ball of an appropriate radius ɛ (see Eq. (1)), which could also include the unknown real distribution P

Neural Information Processing Systems 

We thank reviewers for the constructive comments. First, generators in existing methods tend to fit the empirical distribution. Given a bad training set, their generated data could be worse. Second, these generators often produce "easy" samples Since ɛ is unknown, it is common to take λ as a hyper parameter to be tuned in experiments (e.g. Moreover, the generator could conduct "data augmentation" for the We may thus receive a slightly better result, e.g.