Characterizing the Optimal 0 1 Loss for Multi-class Classification with a Test-time Attacker Wenxin Ding 2 Daniel Cullina
–Neural Information Processing Systems
Finding classifiers robust to adversarial examples is critical for their safe deployment. Determining the robustness of the best possible classifier under a given threat model for a fixed data distribution and comparing it to that achieved by state-ofthe-art training methods is thus an important diagnostic tool. In this paper, we find achievable information-theoretic lower bounds on robust loss in the presence of a test-time attacker for multi-class classifiers on any discrete dataset. We provide a general framework for finding the optimal 0 1 loss that revolves around the construction of a conflict hypergraph from the data and adversarial constraints. The prohibitive cost of this formulation in practice leads us to formulate other variants of the attacker-classifier game that more efficiently determine the range of the optimal loss.
Neural Information Processing Systems
Feb-11-2025, 06:10:26 GMT