68d3743587f71fbaa5062152985aff40-AuthorFeedback.pdf

Neural Information Processing Systems 

We thank the reviewers for their insightful comments. However, it makes the cross-entropy loss term negligible (Eq. Hence,itdoesnotequallyconstrain14 the network to produce small fractional concentration parameters (i.eαc = expzc(x) 0) for OOD examples,15 that leads to the desired multi-modal Dirichlet distributions (Fig 1d; paper). We achieve simi-25 lar performances as non-Bayesian OE [1], and better results for C100 and TIM.26