Learning Supervised Topic Models for Classification and Regression from Crowds
Rodrigues, Filipe, Lourenço, Mariana, Ribeiro, Bernardete, Pereira, Francisco
Hence, it is seldom the case where a single oracle labels an entire collection. Furthermore, the Web, through its social nature, also exploits the wisdom of crowds to annotate large collections of documents and images. By categorizing texts, tagging images or rating products and places, Web users are generating large volumes of labeled content. However, when learning supervised models from crowds, the quality of labels can vary significantly due to task subjectivity and differences in annotator reliability (or bias) [9], [10]. If we consider a sentiment analysis task, it becomes clear that the subjectiveness of the exercise is prone to generate considerably distinct labels from different annotators. Similarly, online product reviews are known to vary considerably depending on the personal biases and volatility of the reviewer's opinions. It is therefore essential to account for these issues when learning from this increasingly common type of data. Hence, the interest of researchers on building models that take the reliabilities of different annotators into consideration and mitigate the effect of their biases has spiked during the last few years (e.g.
Aug-17-2018
- Country:
- North America > United States
- Massachusetts > Middlesex County > Cambridge (0.04)
- Europe
- Portugal > Coimbra
- Coimbra (0.05)
- Denmark > Capital Region
- Kongens Lyngby (0.04)
- Portugal > Coimbra
- Asia
- Middle East > Jordan (0.04)
- Singapore (0.04)
- North America > United States
- Genre:
- Research Report (1.00)
- Industry:
- Media (0.68)