Goto

Collaborating Authors

 lp-a3


Supplementary Material A Algorithmic Details

Neural Information Processing Systems

A.1 Data Selection Via Time-Consistency We use time-consistency (TCS) [63] to select informative sample to apply our augmentation, which TCS can effectively improve the performance. T is set to be 5 for all the experiments.Algorithm 2 Fast Lagarangian Attack MethodInput: Training data ( x, y); The class preserving margin σ; Neural Network F () Output: Y) = I (X Y). 2) Proof of null -Minimality Since X is a deterministic function of Y and N, we have I ( X Y |N) = I (X Y) (33) Note that (33) holds for all sufficient statistics of X w.r.t. The proof of null -minimality is identical to the one under Problem (8). The two conditions in Theorem 4.2, Condition (a) or Condition (b), requires that the augmentation We show by Lemma B.2 that this InfoMin principle In contrast, our Theorem 4.2 characterizes two key conditions of P (X = x, Y = y) log P (X) = x, Y = y) P (X = x) P ( Y = y) (44) = I (X Y) (45) where the third equation utilizes the property of symmetric augmentation.Lemma B.2 If Assumption 4.1 holds, i.e., Lemma B.2 can be obtained by a simple adaptation from Proposition 3.1 by Achille and Soatto [ All the models are trained for 300 epochs. All the noise are symmetric noise.



Supplementary Material A Algorithmic Details

Neural Information Processing Systems

A.1 Data Selection Via Time-Consistency We use time-consistency (TCS) [63] to select informative sample to apply our augmentation, which TCS can effectively improve the performance. T is set to be 5 for all the experiments.Algorithm 2 Fast Lagarangian Attack MethodInput: Training data ( x, y); The class preserving margin σ; Neural Network F () Output: Y) = I (X Y). 2) Proof of null -Minimality Since X is a deterministic function of Y and N, we have I ( X Y |N) = I (X Y) (33) Note that (33) holds for all sufficient statistics of X w.r.t. The proof of null -minimality is identical to the one under Problem (8). The two conditions in Theorem 4.2, Condition (a) or Condition (b), requires that the augmentation We show by Lemma B.2 that this InfoMin principle In contrast, our Theorem 4.2 characterizes two key conditions of P (X = x, Y = y) log P (X) = x, Y = y) P (X = x) P ( Y = y) (44) = I (X Y) (45) where the third equation utilizes the property of symmetric augmentation.Lemma B.2 If Assumption 4.1 holds, i.e., Lemma B.2 can be obtained by a simple adaptation from Proposition 3.1 by Achille and Soatto [ All the models are trained for 300 epochs. All the noise are symmetric noise.



Adversarial Auto-Augment with Label Preservation: A Representation Learning Principle Guided Approach

Yang, Kaiwen, Sun, Yanchao, Su, Jiahao, He, Fengxiang, Tian, Xinmei, Huang, Furong, Zhou, Tianyi, Tao, Dacheng

arXiv.org Artificial Intelligence

Data augmentation is a critical contributing factor to the success of deep learning but heavily relies on prior domain knowledge which is not always available. Recent works on automatic data augmentation learn a policy to form a sequence of augmentation operations, which are still pre-defined and restricted to limited options. In this paper, we show that a prior-free autonomous data augmentation's objective can be derived from a representation learning principle that aims to preserve the minimum sufficient information of the labels. Given an example, the objective aims at creating a distant "hard positive example" as the augmentation, while still preserving the original label. We then propose a practical surrogate to the objective that can be optimized efficiently and integrated seamlessly into existing methods for a broad class of machine learning tasks, e.g., supervised, semi-supervised, and noisy-label learning. Unlike previous works, our method does not require training an extra generative model but instead leverages the intermediate layer representations of the end-task model for generating data augmentations. In experiments, we show that our method consistently brings non-trivial improvements to the three aforementioned learning tasks from both efficiency and final performance, either or not combined with strong pre-defined augmentations, e.g., on medical images when domain knowledge is unavailable and the existing augmentation techniques perform poorly.