To be Robust or to be Fair: Towards Fairness in Adversarial Training

Xu, Han, Liu, Xiaorui, Li, Yaxin, Tang, Jiliang

arXiv.org Machine Learning 

Adversarial training algorithms have been proven to be reliable to improve machine learning models' robustness against adversarial examples. However, we find that adversarial training algorithms tend to introduce severe disparity of accuracy and robustness between different groups of data. This phenomenon happens in balanced datasets and does not exist in naturally trained models when only using clean samples. In this work, we theoretically show that this phenomenon can generally happen under adversarial training algorithms which minimize DNN models' robust errors. Motivated by these findings, we propose a Fair-Robust-Learning (FRL) framework to mitigate this unfairness problem when doing adversarial defenses and experimental results validate the effectiveness of FRL. The existence of adversarial examples (Goodfellow et al., 2014; Szegedy et al., 2013) causes huge concerns when applying deep neural networks on safety-critical tasks, such as autonomous driving vehicles and face identification. These adversarial examples are artificially crafted samples.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found