Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces

Rawat, Ankit Singh, Menon, Aditya Krishna, Jitkrittum, Wittawat, Jayasumana, Sadeep, Yu, Felix X., Reddi, Sashank, Kumar, Sanjiv

May-12-2021–arXiv.org Machine Learning

Classification problems with a large number of labels arise in language modelling [Mikolov et al., 2013, Levy and Goldberg, 2014], recommender systems [Covington et al., 2016, Xu et al., 2016], and information retrieval [Agrawal et al., 2013, Prabhu and Varma, 2014]. Such large-output problems pose a core challenge: losses such as the softmax cross-entropy can be prohibitive to optimise, as they depend on the entire set of labels. Several works have thus devised negative sampling schemes for efficiently and effectively approximating such losses [Bengio and Senecal, 2008, Blanc and Rendle, 2018, Ruiz et al., 2018, Bamler and Mandt, 2020]. Broadly, negative sampling techniques sample a subset of "negative" labels, which are used to contrast against the observed "positive" labels. One further applies a suitable weighting on these "negatives", which ostensibly corrects the sampling bias introduced by the dependence on a random subset of labels. Intuitively, such bias assesses how closely a scheme approximates the unsampled loss on the full label set. This bias is well understood for sampled softmax schemes (see, e.g., Bengio and Senecal [2008]); surprisingly, however, far less is understood about other popular schemes, e.g., within-batch and uniform sampling (cf.

artificial intelligence, neural network, softmax cross-entropy, (20 more...)

arXiv.org Machine Learning

May-12-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States > California > San Francisco County > San Francisco (0.14)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks (0.68)
    - Statistical Learning (0.72)
  - Representation & Reasoning > Personal Assistant Systems (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found