Training Uncertainty
–Neural Information Processing Systems
The first subset (in red) is utilized to evaluate a traditional accuracy-basedlossfunction `a,suchasthecrossentropy. This benchmark is based on a loss function designed to incentivize the trained model to produce the smallest possible conformal prediction sets with the desired coverage (e.g., 90% ifα = 0.1). The hybrid training procedure is similar to Algorithm 1, in the sense that it relies on analogous soft-sorting, soft-ranking, and soft-indexing algorithms toevaluate adifferentiable approximation Wi oftheconformity scoreWi in(8). Above, the second equality follows directly from the fact thatS(x,U;π,t), defined in (A2), is by construction increasing in t, and therefore Y / S(x,U;π,1 α) if and only if min{t [0,1]:Y S(x,U;π,t)}>1 α. The proof consists of showing that`a and`u are separately minimized by ˆπ = π,although only approximately inthelatter case.
Neural Information Processing Systems
Feb-10-2026, 16:17:20 GMT