Supplementary Material A Algorithmic Details

Neural Information Processing Systems 

A.1 Data Selection Via Time-Consistency We use time-consistency (TCS) [63] to select informative sample to apply our augmentation, which TCS can effectively improve the performance. T is set to be 5 for all the experiments.Algorithm 2 Fast Lagarangian Attack MethodInput: Training data ( x, y); The class preserving margin σ; Neural Network F () Output: Y) = I (X Y). 2) Proof of null -Minimality Since X is a deterministic function of Y and N, we have I ( X Y |N) = I (X Y) (33) Note that (33) holds for all sufficient statistics of X w.r.t. The proof of null -minimality is identical to the one under Problem (8). The two conditions in Theorem 4.2, Condition (a) or Condition (b), requires that the augmentation We show by Lemma B.2 that this InfoMin principle In contrast, our Theorem 4.2 characterizes two key conditions of P (X = x, Y = y) log P (X) = x, Y = y) P (X = x) P ( Y = y) (44) = I (X Y) (45) where the third equation utilizes the property of symmetric augmentation.Lemma B.2 If Assumption 4.1 holds, i.e., Lemma B.2 can be obtained by a simple adaptation from Proposition 3.1 by Achille and Soatto [ All the models are trained for 300 epochs. All the noise are symmetric noise.