Leveraging GPT-2 for Classifying Spam Reviews with Limited Labeled Data via Adversarial Training

Irissappane, Athirai A., Yu, Hanfei, Shen, Yankun, Agrawal, Anubha, Stanton, Gray

Dec-24-2020–arXiv.org Artificial Intelligence

Online reviews are a vital source of information when purchasing a service or a product. Opinion spammers manipulate these reviews, deliberately altering the overall perception of the service. Though there exists a corpus of online reviews, only a few have been labeled as spam or non-spam, making it difficult to train spam detection models. We propose an adversarial training mechanism leveraging the capabilities of Generative Pre-Training 2 (GPT-2) for classifying opinion spam with limited labeled data and a large set of unlabeled data. Experiments on TripAdvisor and YelpZip datasets show that the proposed model outperforms state-of-the-art techniques by at least 7% in terms of accuracy when labeled data is limited. The proposed model can also generate synthetic spam/non-spam reviews with reasonable perplexity, thereby, providing additional labeled data during training.

classifier, generator, proceedings, (15 more...)

arXiv.org Artificial Intelligence

Dec-24-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Colorado (0.04)
  - Washington > Pierce County
    - Tacoma (0.04)
  - Illinois > Cook County
    - Chicago (0.04)

Genre:
- Research Report > Promising Solution (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Neural Networks > Deep Learning (1.00)
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found