The Efficiency of Pre-training with Objective Masking in Pseudo Labeling for Semi-Supervised Text Classification

Hatefi, Arezoo, Vu, Xuan-Son, Bhuyan, Monowar, Drewes, Frank

May-13-2025–arXiv.org Artificial Intelligence

We extend and study a semi-supervised model for text classification proposed earlier by Hatefi et al. for classification tasks in which document classes are described by a small number of gold-labeled examples, while the majority of training examples is unlabeled. The model leverages the teacher-student architecture of Meta Pseudo Labels in which a ''teacher'' generates labels for originally unlabeled training data to train the ''student'' and updates its own model iteratively based on the performance of the student on the gold-labeled portion of the data. We extend the original model of Hatefi et al. by an unsupervised pre-training phase based on objective masking, and conduct in-depth performance evaluations of the original model, our extension, and various independent baselines. Experiments are performed using three different datasets in two different languages (English and Swedish).

machine learning, natural language, text classification, (22 more...)

arXiv.org Artificial Intelligence

May-13-2025

arXiv.org PDF

Add feedback

Country:
- North America (0.67)

Genre:
- Research Report > New Finding (0.93)

Industry:
- Education (0.93)
- Health & Medicine (0.93)
- Media > News (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Text Classification (1.00)
    - Text Processing (0.68)
  - Machine Learning
    - Inductive Learning (1.00)
    - Neural Networks > Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found