fake review
An Optimized Machine Learning Classifier for Detecting Fake Reviews Using Extracted Features
Anees, Shabbir, Anshuman, null, Chaurasia, Ayush, Bogar, Prathmesh
It is well known that fraudulent reviews cast doubt on the legitimacy and dependability of online purchases. The most recent development that leads customers towards darkness is the appearance of human reviews in computer-generated (CG) ones. In this work, we present an advanced machine-learning-based system that analyses these reviews produced by AI with remarkable precision. Our method integrates advanced text preprocessing, multi-modal feature extraction, Harris Hawks Optimization (HHO) for feature selection, and a stacking ensemble classifier. We implemented this methodology on a public dataset of 40,432 Original (OR) and Computer-Generated (CG) reviews. From an initial set of 13,539 features, HHO selected the most applicable 1,368 features, achieving an 89.9% dimensionality reduction. Our final stacking model achieved 95.40% accuracy, 92.81% precision, 95.01% recall, and a 93.90% F1-Score, which demonstrates that the combination of ensemble learning and bio-inspired optimisation is an effective method for machine-generated text recognition. Because large-scale review analytics commonly run on cloud platforms, privacy-preserving techniques such as differential approaches and secure outsourcing are essential to protect user data in these systems.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Overview (0.66)
- Research Report (0.51)
Domain Knowledge-Enhanced LLMs for Fraud and Concept Drift Detection
Şenol, Ali, Agrawal, Garima, Liu, Huan
Detecting deceptive conversations on dynamic platforms is increasingly difficult due to evolving language patterns and Concept Drift (CD)-i.e., semantic or topical shifts that alter the context or intent of interactions over time. These shifts can obscure malicious intent or mimic normal dialogue, making accurate classification challenging. While Large Language Models (LLMs) show strong performance in natural language tasks, they often struggle with contextual ambiguity and hallucinations in risk-sensitive scenarios. To address these challenges, we present a Domain Knowledge (DK)-Enhanced LLM framework that integrates pretrained LLMs with structured, task-specific insights to perform fraud and concept drift detection. The proposed architecture consists of three main components: (1) a DK-LLM module to detect fake or deceptive conversations; (2) a drift detection unit (OCDD) to determine whether a semantic shift has occurred; and (3) a second DK-LLM module to classify the drift as either benign or fraudulent. We first validate the value of domain knowledge using a fake review dataset and then apply our full framework to SEConvo, a multiturn dialogue dataset that includes various types of fraud and spam attacks. Results show that our system detects fake conversations with high accuracy and effectively classifies the nature of drift. Guided by structured prompts, the LLaMA-based implementation achieves 98% classification accuracy. Comparative studies against zero-shot baselines demonstrate that incorporating domain knowledge and drift awareness significantly improves performance, interpretability, and robustness in high-stakes NLP applications.
- North America > United States > Arizona > Maricopa County > Tempe (0.04)
- Europe > Italy > Sicily (0.04)
- Asia > Middle East > Republic of Türkiye > Mersin Province > Mersin (0.04)
- Overview (1.00)
- Research Report > New Finding (0.88)
- Information Technology > Security & Privacy (0.94)
- Law Enforcement & Public Safety > Fraud (0.70)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Large Language Models as 'Hidden Persuaders': Fake Product Reviews are Indistinguishable to Humans and Machines
Meng, Weiyao, Harvey, John, Goulding, James, Carter, Chris James, Lukinova, Evgeniya, Smith, Andrew, Frobisher, Paul, Forrest, Mina, Nica-Avram, Georgiana
Reading and evaluating product reviews is central to how most people decide what to buy and consume online. However, the recent emergence of Large Language Models and Generative Artificial Intelligence now means writing fraudulent or fake reviews is potentially easier than ever. Through three studies we demonstrate that (1) humans are no longer able to distinguish between real and fake product reviews generated by machines, averaging only 50.8% accuracy overall - essentially the same that would be expected by chance alone; (2) that LLMs are likewise unable to distinguish between fake and real reviews and perform equivalently bad or even worse than humans; and (3) that humans and LLMs pursue different strategies for evaluating authenticity which lead to equivalently bad accuracy, but different precision, recall and F1 scores - indicating they perform worse at different aspects of judgment. The results reveal that review systems everywhere are now susceptible to mechanised fraud if they do not depend on trustworthy purchase verification to guarantee the authenticity of reviewers. Furthermore, the results provide insight into the consumer psychology of how humans judge authenticity, demonstrating there is an inherent 'scepticism bias' towards positive reviews and a special vulnerability to misjudge the authenticity of fake negative reviews. Additionally, results provide a first insight into the 'machine psychology' of judging fake reviews, revealing that the strategies LLMs take to evaluate authenticity radically differ from humans, in ways that are equally wrong in terms of accuracy, but different in their misjudgments.
- Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.14)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Overview (1.00)
- Health & Medicine (0.67)
- Information Technology > Services (0.46)
Data Augmentation for Fake Reviews Detection in Multiple Languages and Multiple Domains
With the growth of the Internet, buying habits have changed, and customers have become more dependent on the online opinions of other customers to guide their purchases. Identifying fake reviews thus became an important area for Natural Language Processing (NLP) research. However, developing high-performance NLP models depends on the availability of large amounts of training data, which are often not available for low-resource languages or domains. In this research, we used large language models to generate datasets to train fake review detectors. Our approach was used to generate fake reviews in different domains (book reviews, restaurant reviews, and hotel reviews) and different languages (English and Chinese). Our results demonstrate that our data augmentation techniques result in improved performance at fake review detection for all domains and languages. The accuracy of our fake review detection model can be improved by 0.3 percentage points on DeRev TEST, 10.9 percentage points on Amazon TEST, 8.3 percentage points on Yelp TEST and 7.2 percentage points on DianPing TEST using the augmented datasets.
- Asia > Middle East > Iraq (0.04)
- Asia > Afghanistan (0.04)
- North America > United States (0.04)
- (2 more...)
Google agrees to changes to tackle fake reviews for businesses
It is not the first pledge to tackle fake reviews, a problem which artificial intelligence (AI) is exacerbating. Amazon and Google have been under investigation by the CMA over fake reviews since June, 2021 – months after the consumer group Which? The CMA has said its investigation into Amazon is ongoing. Rocio Concha, the director of policy and advocacy at Which?, said: "Which? "The changes should help prevent consumers from being misled by unscrupulous businesses and fake review brokers.
What Matters in Explanations: Towards Explainable Fake Review Detection Focusing on Transformers
Shajalal, Md, Atabuzzaman, Md, Boden, Alexander, Stevens, Gunnar, Du, Delong
Customers' reviews and feedback play crucial role on electronic commerce~(E-commerce) platforms like Amazon, Zalando, and eBay in influencing other customers' purchasing decisions. However, there is a prevailing concern that sellers often post fake or spam reviews to deceive potential customers and manipulate their opinions about a product. Over the past decade, there has been considerable interest in using machine learning (ML) and deep learning (DL) models to identify such fraudulent reviews. Unfortunately, the decisions made by complex ML and DL models - which often function as \emph{black-boxes} - can be surprising and difficult for general users to comprehend. In this paper, we propose an explainable framework for detecting fake reviews with high precision in identifying fraudulent content with explanations and investigate what information matters most for explaining particular decisions by conducting empirical user evaluation. Initially, we develop fake review detection models using DL and transformer models including XLNet and DistilBERT. We then introduce layer-wise relevance propagation (LRP) technique for generating explanations that can map the contributions of words toward the predicted class. The experimental results on two benchmark fake review detection datasets demonstrate that our predictive models achieve state-of-the-art performance and outperform several existing methods. Furthermore, the empirical user evaluation of the generated explanations concludes which important information needs to be considered in generating explanations in the context of fake review identification.
- North America > United States > Virginia (0.04)
- Europe > Germany > North Rhine-Westphalia > Arnsberg Region > Siegen (0.04)
Enhanced Review Detection and Recognition: A Platform-Agnostic Approach with Application to Online Commerce
Karmakar, Priyabrata, Hawkins, John
Online commerce relies heavily on user generated reviews to provide unbiased information about products that they have not physically seen. The importance of reviews has attracted multiple exploitative online behaviours and requires methods for monitoring and detecting reviews. We present a machine learning methodology for review detection and extraction, and demonstrate that it generalises for use across websites that were not contained in the training data. This method promises to drive applications for automatic detection and evaluation of reviews, regardless of their source. Furthermore, we showcase the versatility of our method by implementing and discussing three key applications for analysing reviews: Sentiment Inconsistency Analysis, which detects and filters out unreliable reviews based on inconsistencies between ratings and comments; Multi-language support, enabling the extraction and translation of reviews from various languages without relying on HTML scraping; and Fake review detection, achieved by integrating a trained NLP model to identify and distinguish between genuine and fake reviews.
- Oceania > Australia (0.04)
- South America (0.04)
- North America > Central America (0.04)
- Research Report (0.50)
- Overview (0.46)
- Information Technology > Services (0.48)
- Retail (0.46)
Finding fake reviews in e-commerce platforms by using hybrid algorithms
Periasamy, Mathivanan, Mahadevan, Rohith, S, Bagiya Lakshmi, Raman, Raja CSP, S, Hasan Kumar, Jessiman, Jasper
Sentiment analysis, a vital component in natural language processing, plays a crucial role in understanding the underlying emotions and opinions expressed in textual data. In this paper, we propose an innovative ensemble approach for sentiment analysis for finding fake reviews that amalgamate the predictive capabilities of Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Decision Tree classifiers. Our ensemble architecture strategically combines these diverse models to capitalize on their strengths while mitigating inherent weaknesses, thereby achieving superior accuracy and robustness in fake review prediction. By combining all the models of our classifiers, the predictive performance is boosted and it also fosters adaptability to varied linguistic patterns and nuances present in real-world datasets. The metrics accounted for on fake reviews demonstrate the efficacy and competitiveness of the proposed ensemble method against traditional single-model approaches. Our findings underscore the potential of ensemble techniques in advancing the state-of-the-art in finding fake reviews using hybrid algorithms, with implications for various applications in different social media and e-platforms to find the best reviews and neglect the fake ones, eliminating puffery and bluffs.
- Europe > Switzerland > Basel-City > Basel (0.04)
- Asia > Singapore (0.04)
Google Gemini invented fake reviews smearing my book about Big Tech's political biases
'The Five' co-hosts react to Google pausing its Gemini image generation artificial intelligence bot after it refused to produce images of White people. Google Gemini, the tech giant's new AI chatbot meant to rival ChatGPT, invented several fake reviews – which it attributed to real people – meant to discredit my 2020 book on political biases at Google and other big tech companies. On Sunday, amid a sharp backlash against Google over its AI program's apparent political biases, I asked Gemini to explain what my book was about. My book, "The Manipulators: Facebook, Google, Twitter, and Big Tech's War on Conservatives," was a multi-year project on Big Tech's political biases that drew on inside sources, leaked documents and more. I was curious to see if Google's AI program could be trusted to accurately describe an investigative book about Google, but I wasn't prepared for just how misleading it would be.
AiGen-FoodReview: A Multimodal Dataset of Machine-Generated Restaurant Reviews and Images on Social Media
Gambetti, Alessandro, Han, Qiwei
Online reviews in the form of user-generated content (UGC) significantly impact consumer decision-making. However, the pervasive issue of not only human fake content but also machine-generated content challenges UGC's reliability. Recent advances in Large Language Models (LLMs) may pave the way to fabricate indistinguishable fake generated content at a much lower cost. Leveraging OpenAI's GPT-4-Turbo and DALL-E-2 models, we craft AiGen-FoodReview, a multi-modal dataset of 20,144 restaurant review-image pairs divided into authentic and machine-generated. We explore unimodal and multimodal detection models, achieving 99.80% multimodal accuracy with FLAVA. We use attributes from readability and photographic theories to score reviews and images, respectively, demonstrating their utility as hand-crafted features in scalable and interpretable detection models, with comparable performance. The paper contributes by open-sourcing the dataset and releasing fake review detectors, recommending its use in unimodal and multimodal fake review detection tasks, and evaluating linguistic and visual features in synthetic versus authentic data.
- Europe > Portugal > Lisbon > Lisbon (0.14)
- North America > United States > New York (0.04)