Class-Disentanglement and Applications in Adversarial Detection and Defense

Oct-11-2024, 15:46:56 GMT–Neural Information Processing Systems

What is the minimum necessary information required by a neural net D(\cdot) from an image x to accurately predict its class? Extracting such information in the input space from x can allocate the areas D(\cdot) mainly attending to and shed novel insights to the detection and defense of adversarial attacks. In this paper, we propose ''class-disentanglement'' that trains a variational autoencoder G(\cdot) to extract this class-dependent information as x - G(x) via a trade-off between reconstructing x by G(x) and classifying x by D(x-G(x)), where the former competes with the latter in decomposing x so the latter retains only necessary information for classification in x-G(x) . We apply it to both clean images and their adversarial images and discover that the perturbations generated by adversarial attacks mainly lie in the class-dependent part x-G(x) . The decomposition results also provide novel interpretations to classification and attack models.

adversarial detection and defense, class-disentanglement and application, information, (3 more...)

Neural Information Processing Systems

Oct-11-2024, 15:46:56 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.62)