whichcompletestheproof
Neyman-Pearson multiclass classification under label noise via empirical likelihood
Zhang, Qiong, Tian, Qinglong, Li, Pengfei
In many classification problems, the costs of misclassifying observations from different classes can be highly unequal. The Neyman-Pearson multiclass classification (NPMC) framework addresses this issue by minimizing a weighted misclassification risk while imposing upper bounds on class-specific error probabilities. Existing NPMC methods typically assume that training labels are correctly observed. In practice, however, labels are often corrupted due to measurement error or annotation, and the effect of such label noise on NPMC procedures remains largely unexplored. We study the NPMC problem when only noisy labels are available in the training data. We propose an empirical likelihood (EL)-based method that relates the distributions of noisy and true labels through an exponential tilting density ratio model. The resulting maximum EL estimators recover the class proportions and posterior probabilities of the clean labels required for error control. We establish consistency, asymptotic normality, and optimal convergence rates for these estimators. Under mild conditions, the resulting classifier satisfies NP oracle inequalities with respect to the true labels asymptotically. An expectation-maximization algorithm computes the maximum EL estimators. Simulations show that the proposed method performs comparably to the oracle classifier under clean labels and substantially improves over procedures that ignore label noise.
e9bf14a419d77534105016f5ec122d62-Supplemental.pdf
Therefore, if ν() < +, then we can bound(10) with eαν(). To avoid crowded notations, we drop the conditioning onz from Pr[ |ρ = z]. The issue is how to proceed. Let φ be the standard normal density function andΦ be the CDF. The algorithm using SVT suchthat itonly releases the private answerstothe queries if the answer is sufficiently different from the "guess".
a376033f78e144f494bfc743c0be3330-Supplemental.pdf
Inthis section, we provide theoretical analysis ofHSPG. Moreover, we further point out that: (1) theSub-gradient Descent Stepwe used to achieve a "close enough" solution canbereplaced byothermethods, and(2)theAssumption 4isonlyasufficientcondition thatwecouldusetoshowthe"closeenough"condition. B.1 RelatedWork Problem (12)has been well studied indeterministic optimization with various algorithms that are capable ofreturning solutions with both lowobjectivevalueandhigh group sparsity under proper λ(95;73;42;64). For example, proximal stochastic variance-reduced gradient method (Prox-SVRG)(88)and proximal spider (Prox-Spider) (97) are developed to adopt multi-stage schemes based on the well-known variance reduction technique SVRG proposed in (46) and Spider developed in (22) respectively. Under Assumption 1, the search directiondk is a descent direction forψBk(xk), i.e., d>k ψBk(xk)<0.
789ba2ae4d335e8a2ad283a3f7effced-Supplemental.pdf
A is given byAk,`,Pr[yk(x)=`], which represents the probability ofkth service producing label `. The scalar function Fk,`(X), Pr[qk(x) X|y(x) = `] is the probability of the produced quality score from thekth service less than a thresholdX conditional on that its predicted label is`. There are 3 steps for solving problem 3.3. Thus,ψi() and ψi,j() are piece wise quadratic functions. To solve Problem 3.2, let us first denoteΩ3 = {x In other words,z0 has the same objective function value asz .