Can We Achieve Open Category Detection with Guarantees?
Liu, Si (Oregon State University) | Garrepalli, Risheek (Oregon State University) | Fern, Alan (Oregon State University) | Dietterich, Thomas G. (Oregon State University)
Open category detection is the problem of detecting "alien" test instances that belong to categories/classes that were not present in the training data. In many applications, reliably detecting such aliens is central to ensuring safety and/or quality of test data analysis. Unfortunately, to the best of our knowledge, there are no algorithms that provide theoretical guarantees on their ability to detect aliens under general assumptions. Further, while there are algorithms for open category detection, there are few empirical results that directly report alien-detection rates. Thus, there are significant theoretical and empirical gaps in our understanding of open category detection. In this paper, we take a step toward addressing this gap by studying a simplified, but practically relevant, variant of open category detection. In our setting, we are provided with a "clean" training set that contains only the target categories of interest. However, at test time, some fraction \alpha of the test examples are aliens. Under the assumption that we know an upper bound on \alpha, we develop an algorithm with PAC-style guarantees on the alien detection rate, while aiming to minimize false alarms. Our empirical results on synthetic and benchmark datasets demonstrate the regimes in which the algorithm can be effective and provide a baseline for further advancements.
- Technology: