Misinformation Detection using Large Language Models with Explainability

Patel, Jainee, Bhatt, Chintan, Trivedi, Himani, Nguyen, Thanh Thi

arXiv.org Artificial Intelligence 

The COVID Fake News dataset is a collection of mostly COVID-19 pandemic-specific news headlines and brief claims. The data is representative of the combination of proven factual statements and much misleading or outright false information widespread on digital platforms during the pandemic. The data set was then preprocessed and split into training (8,160 samples) and testing (2,041 samples) categories in a balanced portion so that both real and fake labels could be checked robustly. The dataset used to check whether the pipeline can be applied to other domains rather than the pandemic area is the FakeNewsNet GossipCop. This dataset lies in the domain of entertainment and celebrity news and it is one of the prominent areas where gossip, rumors, fabricated stories are prevalent. Approximately 10,000 samples were used to train, and 2,500 samples were used to test. In the present dataset, the labels distinguish the news objects as Real or Fake by fact-checking them with regards to the original GossipCop platform. The two datasets were combined, standardized, and stratified to ensure the balanced classes in the samples during training and validation. Such prudent training has the benefit of enabling these models to improve in identifying subtle signs in language that may be contained in actual and made-up claims that can be used in enhancing the pipeline to perform better in practical misinformation detection applications.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found