Goto

Collaborating Authors

 certifiably adversarially robust detection


Certifiably Adversarially Robust Detection of Out-of-Distribution Data

Neural Information Processing Systems

Deep neural networks are known to be overconfident when applied to out-of-distribution (OOD) inputs which clearly do not belong to any class. This is a problem in safety-critical applications since a reliable assessment of the uncertainty of a classifier is a key property, allowing to trigger human intervention or to transfer into a safe state. In this paper, we are aiming for certifiable worst case guarantees for OOD detection by enforcing not only low confidence at the OOD point but also in an $l_\infty$-ball around it. For this purpose, we use interval bound propagation (IBP) to upper bound the maximal confidence in the $l_\infty$-ball and minimize this upper bound during training time. We show that non-trivial bounds on the confidence for OOD data generalizing beyond the OOD dataset seen at training time are possible. Moreover, in contrast to certified adversarial robustness which typically comes with significant loss in prediction performance, certified guarantees for worst case OOD detection are possible without much loss in accuracy.


Review for NeurIPS paper: Certifiably Adversarially Robust Detection of Out-of-Distribution Data

Neural Information Processing Systems

Weaknesses: 1) The main weakness of the paper is the way it uses the phrase "worst case OOD detection", which is misleading and not discussed rigorously. In fact, as stated in the abstract, this means "worst case" *within the L_infinity balls around some specific OOD examples*. This paper is *not* providing guarantees about *arbitrary* OOD data, which is, to me, what the phrase "worst case OOD detection" sounds like it refers to. Low confidence can only be guaranteed locally around specific outliers. The empirical results suggest that this may be sufficient in practice in many cases, since exposure on (only) examples from Tiny Images helps provide provable levels of robustness on other OOD datasets at test time.


Certifiably Adversarially Robust Detection of Out-of-Distribution Data

Neural Information Processing Systems

Deep neural networks are known to be overconfident when applied to out-of-distribution (OOD) inputs which clearly do not belong to any class. This is a problem in safety-critical applications since a reliable assessment of the uncertainty of a classifier is a key property, allowing to trigger human intervention or to transfer into a safe state. In this paper, we are aiming for certifiable worst case guarantees for OOD detection by enforcing not only low confidence at the OOD point but also in an l_\infty -ball around it. For this purpose, we use interval bound propagation (IBP) to upper bound the maximal confidence in the l_\infty -ball and minimize this upper bound during training time. We show that non-trivial bounds on the confidence for OOD data generalizing beyond the OOD dataset seen at training time are possible.