Stanceosaurus: Classifying Stance Towards Multilingual Misinformation
Zheng, Jonathan, Baheti, Ashutosh, Naous, Tarek, Xu, Wei, Ritter, Alan
–arXiv.org Artificial Intelligence
We present Stanceosaurus, a new corpus of 28,033 tweets in English, Hindi, and Arabic annotated with stance towards 251 misinformation claims. As far as we are aware, it is the largest corpus annotated with stance towards misinformation claims. The claims in Stanceosaurus originate from 15 fact-checking sources that cover diverse geographical regions and cultures. Unlike existing stance datasets, we introduce a more fine-grained 5-class labeling strategy with additional subcategories to distinguish implicit stance. Pre-trained transformer-based stance classifiers that are fine-tuned on our corpus show good generalization on unseen claims and regional claims from countries outside the training data. Cross-lingual experiments demonstrate Stanceosaurus' capability of training multi-lingual models, achieving 53.1 F1 on Hindi and 50.4 F1 on Arabic without any target-language fine-tuning. Finally, we show how a domain adaptation method can be used to improve performance on Stanceosaurus using additional RumourEval-2019 data. We make Stanceosaurus publicly available to the research community and hope it will encourage further work on misinformation identification across languages and cultures.
arXiv.org Artificial Intelligence
Oct-28-2022
- Country:
- South America > Brazil (0.04)
- Antarctica (0.04)
- Oceania
- New Zealand (0.04)
- Australia > Western Australia (0.04)
- North America
- Dominican Republic (0.04)
- Barbados (0.04)
- United States
- Maine (0.04)
- Louisiana (0.04)
- Arizona > Maricopa County (0.04)
- New York > New York County
- New York City (0.04)
- New Mexico > Santa Fe County
- Santa Fe (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Colorado > Boulder County
- Boulder (0.04)
- California > San Diego County
- San Diego (0.04)
- Canada
- Ontario > Toronto (0.04)
- Quebec (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Europe
- Finland (0.04)
- Ukraine (0.04)
- Sweden (0.04)
- Bulgaria > Varna Province
- Varna (0.04)
- Italy > Tuscany
- Florence (0.04)
- Germany
- Berlin (0.04)
- Bavaria > Middle Franconia
- Nuremberg (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Spain
- Valencian Community > Valencia Province
- Valencia (0.04)
- Catalonia > Barcelona Province
- Barcelona (0.04)
- Valencian Community > Valencia Province
- Portugal > Lisbon
- Lisbon (0.04)
- United Kingdom
- Scotland > City of Edinburgh
- Edinburgh (0.04)
- England > Oxfordshire
- Oxford (0.04)
- Scotland > City of Edinburgh
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Asia
- Middle East > Palestine (0.14)
- India (0.05)
- Singapore (0.04)
- Malaysia (0.04)
- Afghanistan (0.04)
- Pakistan > Sindh
- Karachi Division > Karachi (0.04)
- Japan > Honshū
- Kansai > Osaka Prefecture > Osaka (0.04)
- China
- Hong Kong (0.04)
- Hubei Province > Wuhan (0.04)
- Africa
- South Sudan (0.04)
- Malawi (0.04)
- Madagascar (0.04)
- Eswatini (0.04)
- Middle East > Egypt
- Giza Governorate > Giza (0.04)
- Genre:
- Research Report (0.82)
- Industry:
- Media > News (1.00)
- Materials > Chemicals (1.00)
- Health & Medicine
- Pharmaceuticals & Biotechnology (1.00)
- Epidemiology (1.00)
- Therapeutic Area
- Pulmonary/Respiratory Diseases (1.00)
- Oncology (1.00)
- Infections and Infectious Diseases (1.00)
- Immunology (1.00)
- Vaccines (0.96)
- Government
- Technology: