We use T 10 for max-softmax and T 2 for divergence-based scoring functions. We report average performance over last three epochs. Row 1 shows the standard setting where the loss function is KL divergence between the uniform distribution and the softmax output Lee et al.; Hendrycks et al. while the anomaly score is max-softmax. Row 3 features the reversed KL divergence. Minimizing the reversed divergence between the uniform distribution and the softmax distribution is equivalent to maximizing the softmax entropy.
Dec-30-2021, 13:20:22 GMT