Learning Supervised Topic Models for Classification and Regression from Crowds

Rodrigues, Filipe, Lourenço, Mariana, Ribeiro, Bernardete, Pereira, Francisco

arXiv.org Machine Learning 

Hence, it is seldom the case where a single oracle labels an entire collection. Furthermore, the Web, through its social nature, also exploits the wisdom of crowds to annotate large collections of documents and images. By categorizing texts, tagging images or rating products and places, Web users are generating large volumes of labeled content. However, when learning supervised models from crowds, the quality of labels can vary significantly due to task subjectivity and differences in annotator reliability (or bias) [9], [10]. If we consider a sentiment analysis task, it becomes clear that the subjectiveness of the exercise is prone to generate considerably distinct labels from different annotators. Similarly, online product reviews are known to vary considerably depending on the personal biases and volatility of the reviewer's opinions. It is therefore essential to account for these issues when learning from this increasingly common type of data. Hence, the interest of researchers on building models that take the reliabilities of different annotators into consideration and mitigate the effect of their biases has spiked during the last few years (e.g.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found