Confounds and Overestimations in Fake Review Detection: Experimentally Controlling for Product-Ownership and Data-Origin

Soldner, Felix, Kleinberg, Bennett, Johnson, Shane

Dec-8-2022–arXiv.org Artificial Intelligence

The popularity of online shopping is steadily increasing. At the same time, fake product reviews are published widely and have the potential to affect consumer purchasing behavior. In response, previous work has developed automated methods utilizing natural language processing approaches to detect fake product reviews. However, studies vary considerably in how well they succeed in detecting deceptive reviews, and the reasons for such differences are unclear. A contributing factor may be the multitude of strategies used to collect data, introducing potential confounds which affect detection performance. Two possible confounds are data-origin (i.e., the dataset is composed of more than one source) and product ownership (i.e., reviews written by individuals who own or do not own the reviewed product). In the present study, we investigate the effect of both confounds for fake review detection. Using an experimental design, we manipulate data-origin, product ownership, review polarity, and veracity. Supervised learning analysis suggests that review veracity (60.26 - 69.87%) is somewhat detectable but reviews additionally confounded with product-ownership (66.19 - 74.17%), or with data-origin (84.44 - 86.94%) are easier to classify. Review veracity is most easily classified if confounded with product-ownership and data-origin combined (87.78 - 88.12%). These findings are moderated by review polarity.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Dec-8-2022

arXiv.org PDF

Add feedback

Country:
- Asia > Singapore (0.04)
- North America > United States
  - Maryland > Baltimore (0.04)
  - Texas > Travis County
    - Austin (0.04)
  - New Mexico > Santa Fe County
    - Santa Fe (0.04)
- Europe
  - Slovenia (0.04)
  - Netherlands (0.04)
  - United Kingdom > England
    - Greater London > London (0.04)
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Germany > North Rhine-Westphalia
    - Cologne Region > Cologne (0.04)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Information Technology > Services (1.00)
- Retail (0.88)

Technology:
- Information Technology
  - Communications > Social Media
    - Crowdsourcing (0.47)
  - Artificial Intelligence
    - Natural Language (1.00)
    - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found