real news
Prompt-Induced Linguistic Fingerprints for LLM-Generated Fake News Detection
Wang, Chi, Gao, Min, Wang, Zongwei, Yin, Junwei, Shu, Kai, Lin, Chenghua
With the rapid development of large language models, the generation of fake news has become increasingly effortless, posing a growing societal threat and underscoring the urgent need for reliable detection methods. Early efforts to identify LLM-generated fake news have predominantly focused on the textual content itself; however, because much of that content may appear coherent and factually consistent, the subtle traces of falsification are often difficult to uncover. Through distributional divergence analysis, we uncover prompt-induced linguistic fingerprints: statistically distinct probability shifts between LLM-generated real and fake news when maliciously prompted. Based on this insight, we propose a novel method named Linguistic Fingerprints Extraction (LIFE). By reconstructing word-level probability distributions, LIFE can find discriminative patterns that facilitate the detection of LLM-generated fake news. To further amplify these fingerprint patterns, we also leverage key-fragment techniques that accentuate subtle linguistic differences, thereby improving detection reliability. Our experiments show that LIFE achieves state-of-the-art performance in LLM-generated fake news and maintains high performance in human-written fake news. The code and data are available at https://anonymous.4open.science/r/LIFE-E86A.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Austria > Vienna (0.14)
- Asia > China > Chongqing Province > Chongqing (0.05)
- (12 more...)
Exploring Text Representations for Online Misinformation
Mis- and disinformation, commonly collectively called fake news, continue to menace society. Perhaps, the impact of this age-old problem is presently most plain in politics and healthcare. However, fake news is affecting an increasing number of domains. It takes many different forms and continues to shapeshift as technology advances. Though it arguably most widely spreads in textual form, e.g., through social media posts and blog articles. Thus, it is imperative to thwart the spread of textual misinformation, which necessitates its initial detection. This thesis contributes to the creation of representations that are useful for detecting misinformation. Firstly, it develops a novel method for extracting textual features from news articles for misinformation detection. These features harness the disparity between the thematic coherence of authentic and false news stories. In other words, the composition of themes discussed in both groups significantly differs as the story progresses. Secondly, it demonstrates the effectiveness of topic features for fake news detection, using classification and clustering. Clustering is particularly useful because it alleviates the need for a labelled dataset, which can be labour-intensive and time-consuming to amass. More generally, it contributes towards a better understanding of misinformation and ways of detecting it using Machine Learning and Natural Language Processing.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Europe > France (0.04)
- Europe > Russia (0.04)
- (19 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Research Report > Promising Solution (0.65)
- Media > News (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
Impact of Fake News on Social Media Towards Public Users of Different Age Groups
Hakim, Kahlil bin Abdul, Easwaramoorthy, Sathishkumar Veerappampalayam
This study examines how fake news affects social media users across a range of age groups and how machine learning (ML) and artificial intelligence (AI) can help reduce the spread of false information. The paper evaluates various machine learning models for their efficacy in identifying and categorizing fake news and examines current trends in the spread of fake news, including deepfake technology. The study assesses four models using a Kaggle dataset: Random Forest, Support Vector Machine (SVM), Neural Networks, and Logistic Regression. The results show that SVM and neural networks perform better than other models, with accuracies of 93.29% and 93.69%, respectively. The study also emphasises how people in the elder age group diminished capacity for critical analysis of news content makes them more susceptible to disinformation. Natural language processing (NLP) and deep learning approaches have the potential to improve the accuracy of false news detection. Biases in AI and ML models and difficulties in identifying information generated by AI continue to be major problems in spite of the developments. The study recommends that datasets be expanded to encompass a wider range of languages and that detection algorithms be continuously improved to keep up with the latest advancements in disinformation tactics. In order to combat fake news and promote an informed and resilient society, this study emphasizes the value of cooperative efforts between AI researchers, social media platforms, and governments.
- North America > United States (0.15)
- Europe (0.04)
- Asia > Malaysia (0.04)
- (3 more...)
- Research Report > New Finding (0.70)
- Research Report > Experimental Study (0.55)
LOSS-GAT: Label Propagation and One-Class Semi-Supervised Graph Attention Network for Fake News Detection
Lakzaei, Batool, Chehreghani, Mostafa Haghir, Bagheri, Alireza
In the era of widespread social networks, the rapid dissemination of fake news has emerged as a significant threat, inflicting detrimental consequences across various dimensions of people's lives. Machine learning and deep learning approaches have been extensively employed for identifying fake news. However, a significant challenge in identifying fake news is the limited availability of labeled news datasets. Therefore, the One-Class Learning (OCL) approach, utilizing only a small set of labeled data from the interest class, can be a suitable approach to address this challenge. On the other hand, representing data as a graph enables access to diverse content and structural information, and label propagation methods on graphs can be effective in predicting node labels. In this paper, we adopt a graph-based model for data representation and introduce a semi-supervised and one-class approach for fake news detection, called LOSS-GAT. Initially, we employ a two-step label propagation algorithm, utilizing Graph Neural Networks (GNNs) as an initial classifier to categorize news into two groups: interest (fake) and non-interest (real). Subsequently, we enhance the graph structure using structural augmentation techniques. Ultimately, we predict the final labels for all unlabeled data using a GNN that induces randomness within the local neighborhood of nodes through the aggregation function. We evaluate our proposed method on five common datasets and compare the results against a set of baseline models, including both OCL and binary labeled models. The results demonstrate that LOSS-GAT achieves a notable improvement, surpassing 10%, with the advantage of utilizing only a limited set of labeled fake news. Noteworthy, LOSS-GAT even outperforms binary labeled models.
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
- South America > Brazil (0.04)
- Asia > China (0.04)
Fighting Fire with Fire: The Dual Role of LLMs in Crafting and Detecting Elusive Disinformation
Lucas, Jason, Uchendu, Adaku, Yamashita, Michiharu, Lee, Jooyoung, Rohatgi, Shaurya, Lee, Dongwon
Recent ubiquity and disruptive impacts of large language models (LLMs) have raised concerns about their potential to be misused (.i.e, generating large-scale harmful and misleading content). To combat this emerging risk of LLMs, we propose a novel "Fighting Fire with Fire" (F3) strategy that harnesses modern LLMs' generative and emergent reasoning capabilities to counter human-written and LLM-generated disinformation. First, we leverage GPT-3.5-turbo to synthesize authentic and deceptive LLM-generated content through paraphrase-based and perturbation-based prefix-style prompts, respectively. Second, we apply zero-shot in-context semantic reasoning techniques with cloze-style prompts to discern genuine from deceptive posts and news articles. In our extensive experiments, we observe GPT-3.5-turbo's zero-shot superiority for both in-distribution and out-of-distribution datasets, where GPT-3.5-turbo consistently achieved accuracy at 68-72%, unlike the decline observed in previous customized and fine-tuned disinformation detectors. Our codebase and dataset are available at https://github.com/mickeymst/F3.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Pennsylvania > Centre County > University Park (0.04)
- North America > United States > Massachusetts > Middlesex County > Lexington (0.04)
- (3 more...)
- Media > News (1.00)
- Health & Medicine (0.93)
RMDM: A Multilabel Fakenews Dataset for Vietnamese Evidence Verification
Nguyen, Hai-Long, Pham, Thi-Kieu-Trang, Le, Thai-Son, Nguyen, Tan-Minh, Vuong, Thi-Hai-Yen, Nguyen, Ha-Thanh
In this study, we present a novel and challenging multilabel Vietnamese dataset (RMDM) designed to assess the performance of large language models (LLMs), in verifying electronic information related to legal contexts, focusing on fake news as potential input for electronic evidence. The RMDM dataset comprises four labels: real, mis, dis, and mal, representing real information, misinformation, disinformation, and mal-information, respectively. By including these diverse labels, RMDM captures the complexities of differing fake news categories and offers insights into the abilities of different language models to handle various types of information that could be part of electronic evidence. The dataset consists of a total of 1,556 samples, with 389 samples for each label. Preliminary tests on the dataset using GPT-based and BERT-based models reveal variations in the models' performance across different labels, indicating that the dataset effectively challenges the ability of various language models to verify the authenticity of such information. Our findings suggest that verifying electronic information related to legal contexts, including fake news, remains a difficult problem for language models, warranting further attention from the research community to advance toward more reliable AI models for potential legal applications.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > Virginia (0.04)
- Asia > Vietnam > Hanoi > Hanoi (0.04)
Fake News Detectors are Biased against Texts Generated by Large Language Models
Su, Jinyan, Zhuo, Terry Yue, Mansurov, Jonibek, Wang, Di, Nakov, Preslav
The spread of fake news has emerged as a critical challenge, undermining trust and posing threats to society. In the era of Large Language Models (LLMs), the capability to generate believable fake content has intensified these concerns. In this study, we present a novel paradigm to evaluate fake news detectors in scenarios involving both human-written and LLM-generated misinformation. Intriguingly, our findings reveal a significant bias in many existing detectors: they are more prone to flagging LLM-generated content as fake news while often misclassifying human-written fake news as genuine. This unexpected bias appears to arise from distinct linguistic patterns inherent to LLM outputs. To address this, we introduce a mitigation strategy that leverages adversarial training with LLM-paraphrased genuine news. The resulting model yielded marked improvements in detection accuracy for both human and LLM-generated news. To further catalyze research in this domain, we release two comprehensive datasets, \texttt{GossipCop++} and \texttt{PolitiFact++}, thus amalgamating human-validated articles with LLM-generated fake and real news.
- Europe > Ukraine (0.04)
- Europe > United Kingdom (0.04)
- Europe > Spain > Galicia > Madrid (0.04)
- (5 more...)
X-CapsNet For Fake News Detection
Goldani, Mohammad Hadi, Safabakhsh, Reza, Momtazi, Saeedeh
Abstract--News consumption has significantly increased with the growing popularity and use of web-based forums and social media. This sets the stage for misinforming and confusing people. To help reduce the impact of misinformation on users' potential health-related decisions and other intents, it is desired to have machine learning models to detect and combat fake news automatically. This paper proposes a novel transformer-based model using Capsule neural Networks(CapsNet) called X-CapsNet. This model includes a CapsNet with dynamic routing algorithm paralyzed with a size-based classifier for detecting short and long fake news statements. We use two size-based classifiers, a Deep Convolutional Neural Network (DCNN) for detecting long fake news statements and a Multi-Layer Perceptron (MLP) for detecting short news statements. To resolve the problem of representing short news statements, we use indirect features of news created by concatenating the vector of news speaker profiles and a vector of polarity, sentiment, and counting words of news statements. For evaluating the proposed architecture, we use the Covid-19 and the Liar datasets. The results in terms of the F1-score for the Covid-19 dataset and accuracy for the Liar dataset show that models perform better than the state-of-the-art baselines.
- North America > United States (0.14)
- Europe > Italy (0.04)
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
- Asia > India (0.04)
- Media > News (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
LTCR: Long-Text Chinese Rumor Detection Dataset
Ma, Ziyang, Liu, Mengsha, Fang, Guian, Shen, Ying
False information can spread quickly on social media, negatively influencing the citizens' behaviors and responses to social events. To better detect all of the fake news, especially long texts which are harder to find completely, a Long-Text Chinese Rumor detection dataset named LTCR is proposed. The LTCR dataset provides a valuable resource for accurately detecting misinformation, especially in the context of complex fake news related to COVID-19. The dataset consists of 1,729 and 500 pieces of real and fake news, respectively. The average lengths of real and fake news are approximately 230 and 152 characters. We also propose \method, Salience-aware Fake News Detection Model, which achieves the highest accuracy (95.85%), fake news recall (90.91%) and F-score (90.60%) on the dataset. (https://github.com/Enderfga/DoubleCheck)
- Asia > China > Hubei Province > Wuhan (0.05)
- Asia > Pakistan > Islamabad Capital Territory > Islamabad (0.04)
- Europe > Middle East > Cyprus (0.04)
- (2 more...)
- Media > News (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
Fake News Detection and Behavioral Analysis: Case of COVID-19
Li, Chih-Yuan, Kollapally, Navya Martin, Chun, Soon Ae, Geller, James
While the world has been combating COVID-19 for over three years, an ongoing "Infodemic" due to the spread of fake news regarding the pandemic has also been a global issue. The existence of the fake news impact different aspect of our daily lives, including politics, public health, economic activities, etc. Readers could mistake fake news for real news, and consequently have less access to authentic information. This phenomenon will likely cause confusion of citizens and conflicts in society. Currently, there are major challenges in fake news research. It is challenging to accurately identify fake news data in social media posts. In-time human identification is infeasible as the amount of the fake news data is overwhelming. Besides, topics discussed in fake news are hard to identify due to their similarity to real news. The goal of this paper is to identify fake news on social media to help stop the spread. We present Deep Learning approaches and an ensemble approach for fake news detection. Our detection models achieved higher accuracy than previous studies. The ensemble approach further improved the detection performance. We discovered feature differences between fake news and real news items. When we added them into the sentence embeddings, we found that they affected the model performance. We applied a hybrid method and built models for recognizing topics from posts. We found half of the identified topics were overlapping in fake news and real news, which could increase confusion in the population.
- Asia > India (0.05)
- Europe > Ukraine (0.04)
- Asia > Middle East > Iran (0.04)
- (9 more...)
- Media > News (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Government > Regional Government > North America Government > United States Government (0.93)