gradaug
- North America > United States > North Carolina (0.05)
- North America > Canada (0.05)
- North America > United States (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
GradAug: A New Regularization Method for Deep Neural Networks
We propose a new regularization method to alleviate over-fitting in deep neural networks. The key idea is utilizing randomly transformed training samples to regularize a set of sub-networks, which are originated by sampling the width of the original network, in the training process. As such, the proposed method introduces self-guided disturbances to the raw gradients of the network and therefore is termed as Gradient Augmentation (GradAug). We demonstrate that GradAug can help the network learn well-generalized and more diverse representations. Moreover, it is easy to implement and can be applied to various structures and applications. GradAug improves ResNet-50 to 78.79% on ImageNet classification, which is a new state-of-the-art accuracy. By combining with CutMix, it further boosts the performance to 79.67%, which outperforms an ensemble of advanced training tricks. The generalization ability is evaluated on COCO object detection and instance segmentation where GradAug significantly surpasses other state-of-the-art methods. GradAug is also robust to image distortions and FGSM adversarial attacks and is highly effective in low data regimes.
- North America > United States > North Carolina (0.05)
- North America > Canada (0.05)
- North America > United States > North Carolina (0.04)
- North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
) and analysis insightful (R3
We sincerely thank reviewers for their insightful feedback! We address reviewer com ments below. Table 1 shows a comparison on ImageNet. So the training cost of our GradAug is comparable with SOT A methods. Analyzing the effect of different sampling strategies is interesting and we will certainly explore it in future work.
Review for NeurIPS paper: GradAug: A New Regularization Method for Deep Neural Networks
Summary and Contributions: After rebuttal and discussion with other reviewers I have updated my score. However, I do point out several concerns of mine which the authors could consider further validation for: It's good that the authors performed the time/memory comparison in the rebuttal as that was a significant concern of mine. My concerns mostly revolve around what other techniques should we compare this against? Given that this algorithm takes 3-4x the time with comparison to the baseline, I could for example: 1: Train a much larger network and then use compression techniques to slim it to the same size. Mixup which is still 70% faster.
GradAug: A New Regularization Method for Deep Neural Networks
We propose a new regularization method to alleviate over-fitting in deep neural networks. The key idea is utilizing randomly transformed training samples to regularize a set of sub-networks, which are originated by sampling the width of the original network, in the training process. As such, the proposed method introduces self-guided disturbances to the raw gradients of the network and therefore is termed as Gradient Augmentation (GradAug). We demonstrate that GradAug can help the network learn well-generalized and more diverse representations. Moreover, it is easy to implement and can be applied to various structures and applications. GradAug improves ResNet-50 to 78.79% on ImageNet classification, which is a new state-of-the-art accuracy.
[R] NeurIPS-2020 paper: GradAug: A New Regularization Method for Deep Neural Networks
We propose a new regularization method to alleviate over-fitting in deep neural networks. The key idea is utilizing randomly transformed training samples to regularize a set of sub-networks, which are originated by sampling the width of the original network, in the training process. As such, the proposed method introduces self-guided disturbances to the raw gradients of the network and therefore is termed as Gradient Augmentation (GradAug). We demonstrate that GradAug can help the network learn well-generalized and more diverse representations. Moreover, it is easy to implement and can be applied to various structures and applications. GradAug improves ResNet-50 to 78.79% on ImageNet classification, which is a new state-of-the-art accuracy.