HQP: A Human-Annotated Dataset for Detecting Online Propaganda

Maarouf, Abdurahman, Bär, Dominik, Geissler, Dominique, Feuerriegel, Stefan

May-1-2023–arXiv.org Artificial Intelligence

Online propaganda poses a severe threat to the integrity of societies. However, existing datasets for detecting online propaganda have a key limitation: they were annotated using weak labels that can be noisy and even incorrect. To address this limitation, our work makes the following contributions: (1) We present HQP: a novel dataset (N=30,000) for detecting online propaganda with high-quality labels. To the best of our knowledge, HQP is the first dataset for detecting online propaganda that was created through human annotation. (2) We show empirically that state-of-the-art language models fail in detecting online propaganda when trained with weak labels (AUC: 64.03). In contrast, state-of-the-art language models can accurately detect online propaganda when trained with our high-quality labels (AUC: 92.25), which is an improvement of ~44%. (3) To address the cost of labeling, we extend our work to few-shot learning. Specifically, we show that prompt-based learning using a small sample of high-quality labels can still achieve a reasonable performance (AUC: 80.27). Finally, we discuss implications for the NLP community to balance the cost and quality of labeling. Crucially, our work highlights the importance of high-quality labels for sensitive NLP tasks such as propaganda detection.

machine learning, natural language, propaganda, (18 more...)

arXiv.org Artificial Intelligence

May-1-2023

arXiv.org PDF

Add feedback

Country:
- Asia > Russia (1.00)
- Europe
  - Russia (0.06)
  - United Kingdom (0.04)
  - Ukraine
    - Luhansk Oblast (0.04)
    - Kherson Oblast > Kherson (0.04)
    - Donetsk Oblast
      - Mariupol (0.04)
      - Donetsk (0.04)
  - Germany > Bavaria
    - Upper Bavaria > Munich (0.04)

Genre:
- Research Report (0.82)

Industry:
- Media > News (0.68)
- Government
  - Military (1.00)
  - Regional Government
    - Europe Government > Russia Government (0.94)
    - Asia Government > Russia Government (0.94)

Technology:
- Information Technology
  - Communications > Social Media (1.00)
  - Artificial Intelligence
    - Natural Language (1.00)
    - Machine Learning > Neural Networks (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found