cross entropy loss
4c4c937b67cc8d785cea1e42ccea185c-Supplemental.pdf
Proof of Proposition 1. Due to Jensen's inequality and the fact that, by assumption, the distribution of human predictions P(h|x) is not a point-mass, it holds that Eh[`(h(x),y) |x] > `(µh(x),y). Proof of Theorem 3. We first provide the proof of the unconstrained case. Note that the above problem is a linear program and it decouples with respect to x. Therefore, for each x, the optimal solution is clearly given by: π m(d= 1 |x) = 1 if Ey|x[`(m(x),y) Eh|x[`(h,y)]] >0 0 otherwise Next, we provide the proof of the constrained case. To this aim, we consider the dual formulation of the optimization problem, where we only introduce a Lagrangian multiplier τP,b for the first constraint, i.e., maximize Ex π(x) Ey,h|x[`(h,y)] Ey|x[`(m(x),y)] + Ex [τP,b(π(x) b)] (13) subject to 0 π(x) 1 x X. (14) 13 The inner minimization problem can be solved using the similar argument for the unconstrained case.
4c4c937b67cc8d785cea1e42ccea185c-Supplemental.pdf
In our method and all the baselines except surrogate-based triage, we use the cross-entropy loss and implement SGD using Adam optimizer [40] with initial learning rate set by cross validation independently foreachmethod andleveloftriageb. Insurrogate-based triage, weusethelossand optimization method used by the authors in their public implementation. Moreover, we use early stopping with the patience parameterep = 10,i.e.,we stop the training process ifno reduction of cross entropy loss is observed on the validation set. This suggests that the humans aremore accurate than thepredictivemodel throughout theentire feature space. This suggests that the humans are less accurate than the predictive model in some regions of the featurespace.