Time-Aware Datasets are Adaptive Knowledgebases for the New Normal

Suprem, Abhijit, Vaidya, Sanjyot, Ferreira, Joao Eduardo, Pu, Calton

Nov-22-2022–arXiv.org Artificial Intelligence

Recent advances in text classification and knowledge capture in language models have relied on availability of large-scale text datasets. However, language models are trained on static snapshots of knowledge and are limited when that knowledge evolves. This is especially critical for misinformation detection, where new types of misinformation continuously appear, replacing old campaigns. We propose time-aware misinformation datasets to capture time-critical phenomena. In this paper, we first present evidence of evolving misinformation and show that incorporating even simple time-awareness significantly improves classifier accuracy. Second, we present COVID-TAD, a large-scale COVID-19 misinformation da-taset spanning 25 months. It is the first large-scale misinformation dataset that contains multiple snapshots of a datastream and is orders of magnitude bigger than related misinformation datasets. We describe the collection and labeling pro-cess, as well as preliminary experiments.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Nov-22-2022

arXiv.org PDF

Add feedback

Country:
- South America > Brazil
  - São Paulo (0.04)
- North America > United States
  - California (0.04)
  - Arizona (0.04)
- Europe > United Kingdom
  - England (0.04)
- Asia > China
  - Hubei Province > Wuhan (0.04)

Genre:
- Research Report (0.64)

Industry:
- Media > News (1.00)
- Health & Medicine
  - Epidemiology (1.00)
  - Therapeutic Area
    - Pulmonary/Respiratory Diseases (1.00)
    - Infections and Infectious Diseases (1.00)
    - Immunology (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.68)
  - Robots > Autonomous Vehicles (0.47)
  - Natural Language > Chatbot (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found