Supplementary Material for " Training Over-parameterized Models with Non-decomposable Objectives " Algorithm 2 Reductions-based Algorithm for Constraining Coverage (2)
–Neural Information Processing Systems
Recall that the algorithms discussed in Section 2 have two intuitive steps: (i) update the multipliers based on the current classifier's performance and construct a gain matrix G; (ii) train a new classifier by optimizing a cost-sensitive loss ` These algorithms additionally incorporate the "two dataset" trick suggested by Cotter et al. [14] for better generalization, wherein the updates on are performed using a held-out validation set S In Algorithm 2, we seek to find a saddle-point for the Lagrangian max-min problem for (2). See Chen et al. [12], Cotter et al. [16] for theoretical guarantees for the learned classifier, which usually require the algorithms to output a stochastic classifier that averages over the individual iterates h Eban et al. [25] provide details of how the optimization of these metrics can be posed as constrained optimization problems, and in turn reduced More generally, using the reduction techniques from Narasimhan et al. [71], any learning problem of the following form can be reduced to cost-sensitive learning problems, and thus tackled by the proposed approach: max (C[h]) s.t. Narasimhan et al. [71] provide details of the Lagrangian primal-dual optimization for different metrics. We will find the following standard result to be useful in our proofs. Since the negative log is a strictly proper, in the sense of Gneiting and Raftery [30], Williamson et al. [95], we have that: Lemma 6 (Gneiting and Raftery [30], Williamson et al. [95]).
Neural Information Processing Systems
Mar-20-2025, 11:14:46 GMT
- Genre:
- Instructional Material (0.40)
- Industry:
- Education > Focused Education > Special Education (0.45)
- Technology: