Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces

Rawat, Ankit Singh, Menon, Aditya Krishna, Jitkrittum, Wittawat, Jayasumana, Sadeep, Yu, Felix X., Reddi, Sashank, Kumar, Sanjiv

arXiv.org Machine Learning 

Classification problems with a large number of labels arise in language modelling [Mikolov et al., 2013, Levy and Goldberg, 2014], recommender systems [Covington et al., 2016, Xu et al., 2016], and information retrieval [Agrawal et al., 2013, Prabhu and Varma, 2014]. Such large-output problems pose a core challenge: losses such as the softmax cross-entropy can be prohibitive to optimise, as they depend on the entire set of labels. Several works have thus devised negative sampling schemes for efficiently and effectively approximating such losses [Bengio and Senecal, 2008, Blanc and Rendle, 2018, Ruiz et al., 2018, Bamler and Mandt, 2020]. Broadly, negative sampling techniques sample a subset of "negative" labels, which are used to contrast against the observed "positive" labels. One further applies a suitable weighting on these "negatives", which ostensibly corrects the sampling bias introduced by the dependence on a random subset of labels. Intuitively, such bias assesses how closely a scheme approximates the unsampled loss on the full label set. This bias is well understood for sampled softmax schemes (see, e.g., Bengio and Senecal [2008]); surprisingly, however, far less is understood about other popular schemes, e.g., within-batch and uniform sampling (cf.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found