A Proof of Eq . 6

Neural Information Processing Systems 

In this section, we derive the gradients of the objective in Eq. As we mentioned in Sec. Note that each dimension of the random variable is independent, thus we only consider one dimension. A limitation of the dataset is the smaller size. The dataset does not contain all classes in ImageNet, especially those with deformable shapes (e.g., We provide the additional experimental results in this section.